Skip to content

hpnyaggerman/stonks

Repository files navigation

LSTM AI Stock Predictor

A deep learning pipeline for forecasting short-term stock movements using technical indicators, Monte Carlo dropout uncertainty, and out-of-sample backtesting.

Backtest: strategy vs random vs SPY (with uncertainty bands)

Example output: probability-based strategy (neon green), SPY buy-and-hold (white), random baselines (faint white), and ±1σ / ±3σ uncertainty around the random mean (gray). Plots are generated by run_backtest_v4.py after forecasts exist in forecasts/.


Overview

This project provides an end-to-end workflow:

  • Load and process per-ticker indicator CSVs (TrainingData/processor.py, ticker list from TrainingData/stockList.csv)
  • Train a Conv1D + LSTM model with MC dropout in run_forecast_v4.ipynb
  • Write forecasts to forecasts/ and split metadata (split_info.json, oos_start_date.txt) for aligned backtests
  • Run run_backtest_v4.py to simulate trading on the test / OOS window only, write videos/backtest_metrics.json, and export PNGs / MP4s to videos/ and output_plots/

The design emphasizes realistic evaluation (no training dates in the backtest), uncertainty-aware signals, and extension via TrainingData/featuresPy/. Optional raw feeds (insider, sentiment, Fear & Greed) are merged in TrainingData/processor.py when the corresponding files exist.

Repository layout

├── forecasts/                 # *_forecast.csv outputs + split_info.json / oos_start_date.txt
├── cache/                     # Cached preprocessed arrays
├── output_plots/              # Notebook + backtest exports (training curves, feature importance, extra PNGs/MP4s)
├── videos/                    # Backtest PNGs, MP4s, and backtest_metrics.json from run_backtest_v4.py
├── repo_photos/               # Images for this README (commit your PNG here for GitHub)
├── test_ideas/                # Older v3 experiments (classification notebook + script); primary flow is v4
├── config.example.json        # Copy to config.json and set ALPHA_VANTAGE_KEY if using downloader.py
├── TrainingData/
│   ├── indicators_data/
│   │   ├── raw/               # Raw inputs: prices, fear_greed.csv, insiderBuying/, sentiment/
│   │   └── processed/
│   │       ├── SPY-VIX/       # Benchmark series (e.g. SPY for backtest plot)
│   │       └── stocksData/    # One *_processed.csv per ticker
│   ├── featuresPy/            # Extra feature modules (e.g. fear_greed_correlation)
│   ├── stockList.csv          # Tickers processed by processor.py
│   ├── processor.py           # Main feature pipeline (merges optional raw feeds)
│   └── downloader.py          # Optional data download helpers (Alpha Vantage)
├── run_forecast_v4.ipynb      # Train / forecast pipeline (primary notebook)
├── run_backtest_v4.py         # OOS backtest + metrics JSON + plots / videos
├── requirements.txt
└── README.md

Available features (indicators)

close, YesterdayClose, YesterdayOpenLogR, YesterdayHighLogR, YesterdayLowLogR, YesterdayVolumeLogR, YesterdayCloseLogR, MA10, MA20, MA30, DayOfWeek, DayOfMonth, MonthNumber, EMA10, EMA30, RSI, MACD, MACD_Signal, BollingerUpper, BollingerLower, Volatility_10, Volatility_20, Volatility_30, OBV, ZScore, optional insider and sentiment columns, optional Fear & Greed (fear_greed) plus rolling fear_greed_correlation (stock vs index, from featuresPy/fear_greed_correlation.py when raw/fear_greed.csv is present), and gap / volatility / momentum / skew / intraday / sentiment-change fields. See processor.py for the authoritative column list.

Key features

Feature Description
Conv1D + LSTM Captures local patterns and sequence structure in windowed inputs
Monte Carlo dropout Uncertainty (std) around probability forecasts
Train / val / test split Notebook writes OOS start; backtest uses test period only
Walk-forward–style windows Rolling windows over processed history
Batch data generator Efficient training from cached tensors
Confidence threshold Configurable probability gate for “buy” signals (e.g. in notebook CONFIG)
Fear & Greed (optional) Market index + rolling correlation feature when raw/fear_greed.csv is available

Getting started

1. Requirements

pip install -r requirements.txt

2. Processed data

Place processed stock CSVs under TrainingData/indicators_data/processed/stocksData/ (or run the optional download + processor steps below). Ensure TrainingData/stockList.csv lists the tickers you want.

3. Train and forecast

Open and run run_forecast_v4.ipynb from the project root. It trains the model, writes forecasts under forecasts/, and can export plots to output_plots/.

4. Backtest (out-of-sample)

The backtest reads forecasts/split_info.json (or forecasts/oos_start_date.txt / env BACKTEST_OOS_START) so it only trades on dates on or after the OOS start.

python run_backtest_v4.py

Environment: BACKTEST_OOS_START (optional) overrides the OOS start date; otherwise the script uses forecasts/split_info.json or forecasts/oos_start_date.txt from the notebook. Without any of these, the script exits so you do not accidentally backtest on training dates.

5. Live Signals (Buy / Hold)

  1. Train the model in run_forecast_v4.ipynb and run the model export cell (writes to models/).
  2. Run the live scoring:
python run_live_signals.py --min-accepted 0.2 --std-factor 0.0 --mc-samples 25

Note: the min-accepted is the minimum accepted probability for the BUY to be accurate. Ex: min-accepted = 0.5, means that the lowest confidence in the purchased stock giving "above-normal" returns is 50%. The std-factor represents the multiplication factor on the uncertainty on the min-accepted factor. Once the model becomes more accurate, this std-factor matters more, since we want to be very certain about whatever the min accepted probability actually is.

Outputs

  • signals/live_scores_YYYY-MM-DD.csv — per-ticker live scores and status (buy_candidate or no_buy).
  • signals/live_decision_YYYY-MM-DD.csv — one final recommendation row (BUY or NO_BUY).
  • If data is stale/missing, the script refreshes data by running TrainingData/downloader.py and TrainingData/processor.py.

After you add or change optional features in processed CSVs, re-run run_forecast_v4.ipynb so the model trains on the updated columns (see EXCLUDED_COLS / feature list in the notebook).


Optional: build datasets from scratch

Alpha Vantage API

TrainingData/downloader.py runs the fetch scripts under TrainingData/featuresPy/ (e.g. stockScrapper.py, markets.py, insiderbuying.py, sentiment.py). Those scripts read config.json in the project root for ALPHA_VANTAGE_KEY. Copy config.example.json to config.json, add your key from alphavantage.co, then:

python TrainingData/downloader.py

Process raw → stocksData/

python TrainingData/processor.py

Fear & Greed (optional): Add TrainingData/indicators_data/raw/fear_greed.csv with at least date and fear_greed columns. Without it, the pipeline fills a neutral default for fear_greed and skips the correlation helper where not applicable.


README image

To show the screenshot on GitHub, add your exported file as:

repo_photos/random_vs_prob_strategy_uncertainty.png

(e.g. copy from videos/random_vs_prob_strategy_uncertainty.png after a successful backtest).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors