ES Futures Calendar Spread — Mean-Reversion Research

Quantitative research project studying mean-reversion of the E-mini S&P 500 (ES) calendar spread against its cost-of-carry fair value during quarterly roll windows. Covers four roll periods from September 2024 through June 2025, with full signal development, regime gate engineering, and in-sample / out-of-sample backtest evaluation.

Strategy Overview

Alpha source: The observed calendar spread (ES_back − ES_front) persistently deviates from its theoretical fair value:

FV = S × (r_f − q) × ΔT

where r_f = SOFR (via FRED), q = S&P 500 trailing dividend yield (~1.30%), and ΔT = time between expiries from Databento definition schema.

Signal: Enter when the z-score of (spread − FV) crosses ±2.5σ on a 10-minute rolling window (edge-triggered; fill executes at T+1). Exit uses a standard two-layer TP/SL: 90% of position at +0.50 pt (SL moves to breakeven) and 10% at +0.75 pt. Low-z overlay: entries with |entry_z| < 2.0 at fill time use a single layer at +0.25 pt instead. Position sizing is 10 lots per signal.

HC add-on: When |z_fill| > 3.0 at the T+1 fill bar, an unconditional 10-lot add-on executes at T+2, doubling the position to 20 lots with a blended entry price. The add-on is gated only by the same FOMC exclusion as the primary signal. Across all 4 windows: n=117 HC trades, WR=76.1%, avg net/lot=+$2.86. Conditional on window volatility: W1/W2/W4 positive (WR 72–80%), W3 marginal (one gap trade −$4,312 at 2× lots on 2025-03-14). Fixed-point SL does not scale with window sigma.

Session decomposition: Three session types analyzed — European (07:00–12:29 UTC), US RTH (13:30–20:15 UTC), Post-close — with European session identified as the primary alpha source.

Transaction costs: $8.04/lot round-trip (exchange fees + NFA + broker); fill model uses synthetic midprice (zero slippage assumption).

Comparison Baselines

External baseline: Monoyios & Sarno (2002) ESTAR Model

Monoyios, M. & Sarno, L. (2002). "Mean Reversion in Stock Index Futures Markets: A Nonlinear Analysis." Journal of Futures Markets, 22(4), 285–314. DOI: 10.1002/fut.10008

The paper models futures basis nonlinear mean reversion using an ESTAR (Exponential Smooth-Transition Autoregressive) signal:

Phi(gamma; z) = 1 − exp(−gamma^2 · z^2)

Phi → 0 near equilibrium (unit-root regime); Phi → 1 far from equilibrium (fast mean-reversion). Implemented in scripts/ms_baseline.py.

ESTAR baseline results (gamma=0.5, mean-reversion exit at |z|<0.25, SL=0.50pt):

Pool	n	avg_net/lot	p-value	95% CI	Sharpe
IS (W1+W2)	3,000	−$4.36	<0.001***	[−$4.58, −$4.15]	−0.71
OOS (W3+W4)	1,782	−$4.00	<0.001***	[−$4.27, −$3.74]	−0.70

The naive ESTAR parameterisation fails on 1-second intraday bars: the mean-reversion exit (|z| < 0.25) fires at the first tick-back, producing hold times of ~0 minutes and 4,782 trades across 4 windows. The $8.04/lot TC destroys all gross edge.

Internal baseline: Z-score Ungated

The same z-score signal with no regime gate (regime_gate='none').

Split	n	avg_net/lot	p-value
IS (W1+W2)	308	+$1.09	0.365
OOS (W3+W4)	371	+$1.64	0.264

Key Results — V1 Strategy

Combined Net Equity Curve — V1 vs Z-Score Ungated (10-lot, W1 → W4)

V1 (blue) vs Z-Score Ungated (gray). The large W4 drawdown in Ungated is absorbed by V1's drift gate. IS/OOS split at the dashed line.

Per-Window Net P&L — V1 vs Z-Score Ungated

V1 consistently reduces losses in the weaker IS window (W2) and outperforms Ungated in both OOS windows (W3, W4). Faded bars = IS; solid = OOS.

Strategy Comparison — Avg Net P&L / Lot with 95% CI

ESTAR exits at the first tick-back on 1-second bars (hold_med = 0 min), turning all gross edge into TC loss. Ungated and V1 achieve positive per-lot returns; V1 widens the margin in OOS (+$2.47 vs +$1.64/lot).

V1 adds a drift_4h regime gate (symmetric: block entries in the direction of sustained intraday drift, ±0.10 pt threshold, 4-hour lookback) over the ungated signal.

V1 (strategy):

Split	n	Avg Net/Lot	p-value	95% CI
IS (W1+W2)	246	+$1.26	0.361	[−$1.45, +$3.96]
OOS (W3+W4)	299	+$2.47	0.006***	[+$0.70, +$4.25]
OOS European	90	+$3.90	0.013**	[+$0.85, +$6.96]

Ungated (benchmark):

Split	n	Avg Net/Lot	p-value	95% CI
IS (W1+W2)	308	+$1.09	0.365	[−$1.28, +$3.46]
OOS (W3+W4)	371	+$1.64	0.264	[−$1.24, +$4.52]
OOS European	100	+$6.28	<0.001***	[+$3.48, +$9.08]

V1 vs Z-score Ungated:

Metric	Z-score Ungated	V1
Net P&L (10-lot)	+$9,452	+$10,488
Max Drawdown	−$6,659	−$3,115
Recovery Factor	1.42	3.37
OOS per-trade Sharpe	+0.058	+0.159
Annualized Sharpe (OOS)	—	+3.88 (90% CI: [+1.38, +6.63])

Verdict: Marginal positive edge; not yet deployable. V1 achieves statistically significant OOS edge overall (p=0.006***) and in the European session (p=0.013**). Ungated OOS is not significant (p=0.264), confirming the drift gate carries the edge. Evaluation is underpowered (4 roll windows); proper OOS requires ≥10 periods. IS→OOS rank correlation (Spearman) ρ=0.771.

Roll windows:

Window	Contracts	Roll Date	Role
W1	ESU4 → ESZ4	Sep 16, 2024	In-sample
W2	ESZ4 → ESH5	Dec 16, 2024	In-sample
W3	ESH5 → ESM5	Mar 17, 2025	Out-of-sample
W4	ESM5 → ESU5	Jun 16, 2025	Out-of-sample

Per-window results (all sessions combined, V1):

Window	n	WR	Net P&L ×10	p
W1 ESU4→ESZ4 (IS)	120	85.0%	+$3,933	0.078
W2 ESZ4→ESH5 (IS)	126	75.4%	−$843	0.740
W3 ESH5→ESM5 (OOS)	197	87.3%	+$4,736	0.015*
W4 ESM5→ESU5 (OOS)	102	84.3%	+$2,662	0.160

Repository Structure

src/
  strategy.py              # Core backtest engine: gate system, simulate(), compute_stats()

scripts/
  ms_baseline.py           # M&S (2002) ESTAR external baseline
  run_sessions.py          # Multi-window, multi-session backtest runner
  run_backtest.py          # Single-window backtest entrypoint
  result_summary.py        # IS/OOS performance report with bootstrap CIs
  databento_cost_estimate.py  # Phase 2a: cost estimation before data pull
  databento_datapull.py    # Phase 2b: tick data pull (idempotent)

notebooks/
  01_eda.ipynb             # W1 exploratory data analysis
  02_signal_analysis.ipynb # Five-signal evaluation (FV z-score, lead-lag, OBI, TFI, FOMC)
  03_zscore_simulation.py  # Z-score parameter sweep
  04_threshold_analysis.py # Entry/exit threshold optimisation
  05_robustness_globex.py  # Globex session robustness check
  07_tearsheet.py          # W1 performance tearsheet
  08_fomc_trend.py         # FOMC event-day analysis
  09-13_...                # Additional window and signal analyses
  results_dashboard.ipynb  # Portfolio equity curves and MDD visualisation

  supplementary/           # 36 publication-quality analysis notebooks
    config.py              # Shared paths and window definitions
    generate_notebooks.py  # Script to regenerate all supplementary notebooks
    A1-A5                  # Performance & equity curves
    B1-B4                  # FV deviation analysis
    C1-C3                  # OU process & z-score dynamics
    D1-D5                  # Trade analytics
    E1-E9                  # Bid-ask spread dynamics
    F1-F4                  # Spike events
    G1-G3                  # Volume migration
    H1-H3                  # Pre-roll context

es_roll_analysis_workflow.md  # Full pipeline design specification

data/, results/, and reports/ are gitignored. Tick data lives on external storage (~25–35 GB per roll window for mbp-10).

Environment Setup

# Python 3.14 venv
source .venv/bin/activate
pip install -r requirements.txt

Set your Databento API key in the environment:

export DATABENTO_API_KEY="your_key_here"

Data files are expected at /Volumes/SEAGATE/Databento_Futures/. Update paths in notebooks/supplementary/config.py if your storage location differs.

Running the Analysis

# Estimate data costs before pulling (review output before proceeding)
python scripts/databento_cost_estimate.py

# Pull tick data (idempotent — skips existing parquet files)
python scripts/databento_datapull.py

# Run M&S ESTAR external baseline
python scripts/ms_baseline.py

# Run full session backtest (Z-score Ungated + V1 gate variants)
python scripts/run_sessions.py

# Generate IS/OOS performance report with bootstrap CIs
python scripts/result_summary.py

# Regenerate all 36 supplementary notebooks
python notebooks/supplementary/generate_notebooks.py

Signal Architecture Notes

Five alpha signals were evaluated across W1 and W2:

FV Deviation Z-Score — selected; only signal with consistent positive edge
Lead-Lag (front → back) — rejected; peak cross-correlation at lag=0s (contemporaneous, not predictive)
Order Book Imbalance (OBI) — rejected; r ≈ +0.001 to −0.006 across windows (near-zero)
Trade Flow Imbalance (TFI) — rejected; reverses sign across windows
FOMC event-driven jump — implemented as full-day exclusion filter only

Eight regime gates were evaluated; two were accepted into V1:

drift_4h gate — symmetric gate: blocks entries in the direction of sustained intraday drift (4-hour lookback, ±0.10 pt threshold); short-blocking arm saved +$134 IS; long-blocking arm empirically inactive across all 4 windows (spread has structural upward intraday drift during ES roll periods)
low-z two-layer exit — modified exit for |z| < 2.0 bucket; saved +$58 IS

Disclaimer

This repository contains research code and results for educational and analytical purposes. Nothing here constitutes financial advice or a recommendation to trade. Past backtest performance does not guarantee future results. ES futures carry substantial risk of loss.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Supplementary_notebooks		Supplementary_notebooks
blog_figures		blog_figures
notebooks		notebooks
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
es_roll_analysis_workflow.md		es_roll_analysis_workflow.md
generate_blog_charts.py		generate_blog_charts.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ES Futures Calendar Spread — Mean-Reversion Research

Strategy Overview

Comparison Baselines

External baseline: Monoyios & Sarno (2002) ESTAR Model

Internal baseline: Z-score Ungated

Key Results — V1 Strategy

Combined Net Equity Curve — V1 vs Z-Score Ungated (10-lot, W1 → W4)

Per-Window Net P&L — V1 vs Z-Score Ungated

Strategy Comparison — Avg Net P&L / Lot with 95% CI

Repository Structure

Environment Setup

Running the Analysis

Signal Architecture Notes

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ES Futures Calendar Spread — Mean-Reversion Research

Strategy Overview

Comparison Baselines

External baseline: Monoyios & Sarno (2002) ESTAR Model

Internal baseline: Z-score Ungated

Key Results — V1 Strategy

Combined Net Equity Curve — V1 vs Z-Score Ungated (10-lot, W1 → W4)

Per-Window Net P&L — V1 vs Z-Score Ungated

Strategy Comparison — Avg Net P&L / Lot with 95% CI

Repository Structure

Environment Setup

Running the Analysis

Signal Architecture Notes

Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages