LSTM AI Stock Predictor

A deep learning pipeline for forecasting short-term stock movements using technical indicators, Monte Carlo dropout uncertainty, and out-of-sample backtesting.

Example output: probability-based strategy (neon green), SPY buy-and-hold (white), random baselines (faint white), and ±1σ / ±3σ uncertainty around the random mean (gray). Plots are generated by run_backtest_v4.py after forecasts exist in forecasts/.

Overview

This project provides an end-to-end workflow:

Load and process per-ticker indicator CSVs (TrainingData/processor.py, ticker list from TrainingData/stockList.csv)
Train a Conv1D + LSTM model with MC dropout in run_forecast_v4.ipynb
Write forecasts to forecasts/ and split metadata (split_info.json, oos_start_date.txt) for aligned backtests
Run run_backtest_v4.py to simulate trading on the test / OOS window only, write videos/backtest_metrics.json, and export PNGs / MP4s to videos/ and output_plots/

The design emphasizes realistic evaluation (no training dates in the backtest), uncertainty-aware signals, and extension via TrainingData/featuresPy/. Optional raw feeds (insider, sentiment, Fear & Greed) are merged in TrainingData/processor.py when the corresponding files exist.

Repository layout

├── forecasts/                 # *_forecast.csv outputs + split_info.json / oos_start_date.txt
├── cache/                     # Cached preprocessed arrays
├── output_plots/              # Notebook + backtest exports (training curves, feature importance, extra PNGs/MP4s)
├── videos/                    # Backtest PNGs, MP4s, and backtest_metrics.json from run_backtest_v4.py
├── repo_photos/               # Images for this README (commit your PNG here for GitHub)
├── test_ideas/                # Older v3 experiments (classification notebook + script); primary flow is v4
├── config.example.json        # Copy to config.json and set ALPHA_VANTAGE_KEY if using downloader.py
├── TrainingData/
│   ├── indicators_data/
│   │   ├── raw/               # Raw inputs: prices, fear_greed.csv, insiderBuying/, sentiment/
│   │   └── processed/
│   │       ├── SPY-VIX/       # Benchmark series (e.g. SPY for backtest plot)
│   │       └── stocksData/    # One *_processed.csv per ticker
│   ├── featuresPy/            # Extra feature modules (e.g. fear_greed_correlation)
│   ├── stockList.csv          # Tickers processed by processor.py
│   ├── processor.py           # Main feature pipeline (merges optional raw feeds)
│   └── downloader.py          # Optional data download helpers (Alpha Vantage)
├── run_forecast_v4.ipynb      # Train / forecast pipeline (primary notebook)
├── run_backtest_v4.py         # OOS backtest + metrics JSON + plots / videos
├── requirements.txt
└── README.md

Available features (indicators)

close, YesterdayClose, YesterdayOpenLogR, YesterdayHighLogR, YesterdayLowLogR, YesterdayVolumeLogR, YesterdayCloseLogR, MA10, MA20, MA30, DayOfWeek, DayOfMonth, MonthNumber, EMA10, EMA30, RSI, MACD, MACD_Signal, BollingerUpper, BollingerLower, Volatility_10, Volatility_20, Volatility_30, OBV, ZScore, optional insider and sentiment columns, optional Fear & Greed (fear_greed) plus rolling fear_greed_correlation (stock vs index, from featuresPy/fear_greed_correlation.py when raw/fear_greed.csv is present), and gap / volatility / momentum / skew / intraday / sentiment-change fields. See processor.py for the authoritative column list.

Key features

Feature	Description
Conv1D + LSTM	Captures local patterns and sequence structure in windowed inputs
Monte Carlo dropout	Uncertainty (std) around probability forecasts
Train / val / test split	Notebook writes OOS start; backtest uses test period only
Walk-forward–style windows	Rolling windows over processed history
Batch data generator	Efficient training from cached tensors
Confidence threshold	Configurable probability gate for “buy” signals (e.g. in notebook `CONFIG`)
Fear & Greed (optional)	Market index + rolling correlation feature when `raw/fear_greed.csv` is available

Getting started

1. Requirements

pip install -r requirements.txt

2. Processed data

Place processed stock CSVs under TrainingData/indicators_data/processed/stocksData/ (or run the optional download + processor steps below). Ensure TrainingData/stockList.csv lists the tickers you want.

3. Train and forecast

Open and run run_forecast_v4.ipynb from the project root. It trains the model, writes forecasts under forecasts/, and can export plots to output_plots/.

4. Backtest (out-of-sample)

The backtest reads forecasts/split_info.json (or forecasts/oos_start_date.txt / env BACKTEST_OOS_START) so it only trades on dates on or after the OOS start.

python run_backtest_v4.py

Environment: BACKTEST_OOS_START (optional) overrides the OOS start date; otherwise the script uses forecasts/split_info.json or forecasts/oos_start_date.txt from the notebook. Without any of these, the script exits so you do not accidentally backtest on training dates.

5. Live Signals (Buy / Hold)

Train the model in run_forecast_v4.ipynb and run the model export cell (writes to models/).
Run the live scoring:

python run_live_signals.py --min-accepted 0.2 --std-factor 0.0 --mc-samples 25

Note: the min-accepted is the minimum accepted probability for the BUY to be accurate. Ex: min-accepted = 0.5, means that the lowest confidence in the purchased stock giving "above-normal" returns is 50%. The std-factor represents the multiplication factor on the uncertainty on the min-accepted factor. Once the model becomes more accurate, this std-factor matters more, since we want to be very certain about whatever the min accepted probability actually is.

Outputs

signals/live_scores_YYYY-MM-DD.csv — per-ticker live scores and status (buy_candidate or no_buy).
signals/live_decision_YYYY-MM-DD.csv — one final recommendation row (BUY or NO_BUY).
If data is stale/missing, the script refreshes data by running TrainingData/downloader.py and TrainingData/processor.py.

After you add or change optional features in processed CSVs, re-run run_forecast_v4.ipynb so the model trains on the updated columns (see EXCLUDED_COLS / feature list in the notebook).

Optional: build datasets from scratch

Alpha Vantage API

TrainingData/downloader.py runs the fetch scripts under TrainingData/featuresPy/ (e.g. stockScrapper.py, markets.py, insiderbuying.py, sentiment.py). Those scripts read config.json in the project root for ALPHA_VANTAGE_KEY. Copy config.example.json to config.json, add your key from alphavantage.co, then:

python TrainingData/downloader.py

Process raw → `stocksData/`

python TrainingData/processor.py

Fear & Greed (optional): Add TrainingData/indicators_data/raw/fear_greed.csv with at least date and fear_greed columns. Without it, the pipeline fills a neutral default for fear_greed and skips the correlation helper where not applicable.

README image

To show the screenshot on GitHub, add your exported file as:

repo_photos/random_vs_prob_strategy_uncertainty.png

(e.g. copy from videos/random_vs_prob_strategy_uncertainty.png after a successful backtest).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LSTM AI Stock Predictor

Overview

Repository layout

Available features (indicators)

Key features

Getting started

1. Requirements

2. Processed data

3. Train and forecast

4. Backtest (out-of-sample)

5. Live Signals (Buy / Hold)

Optional: build datasets from scratch

Alpha Vantage API

Process raw → `stocksData/`

README image

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.vscode		.vscode
TrainingData		TrainingData
__pycache__		__pycache__
models		models
output_plots		output_plots
repo_photos		repo_photos
signals		signals
test_ideas/classification		test_ideas/classification
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
STYLE.md		STYLE.md
config.example.json		config.example.json
forecast_video_style.py		forecast_video_style.py
requirements.txt		requirements.txt
run_backtest_v4.py		run_backtest_v4.py
run_forecast_v4.ipynb		run_forecast_v4.ipynb
run_live_signals.py		run_live_signals.py
trade_summary_prob_strategy.csv		trade_summary_prob_strategy.csv

Folders and files

Latest commit

History

Repository files navigation

LSTM AI Stock Predictor

Overview

Repository layout

Available features (indicators)

Key features

Getting started

1. Requirements

2. Processed data

3. Train and forecast

4. Backtest (out-of-sample)

5. Live Signals (Buy / Hold)

Optional: build datasets from scratch

Alpha Vantage API

Process raw → stocksData/

README image

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Process raw → `stocksData/`

Packages