A machine learning solution for the QRT Asset Allocation Challenge — predicting whether financial assets will have positive or negative returns based on historical market data.
Given historical market data for various assets, predict the direction of future returns (positive or negative). This is a binary classification problem with applications in quantitative finance and portfolio management.
| Feature Group | Description | Count |
|---|---|---|
RET_1 to RET_20 |
Daily returns over past 20 days | 20 |
VOLATILITY_1 to VOLATILITY_20 |
Historical volatility measures | 20 |
SIGNED_VOLUME_1 to SIGNED_VOLUME_20 |
Signed trading volume | 20 |
AVG_DAILY_TURNOVER |
Average daily turnover | 1 |
We engineer 75+ features from raw market data using technical analysis indicators:
- Rolling average returns (3, 5, 10, 15, 20 day windows)
- Relative performance vs. asset group (outperformance detection)
- Rolling volatility measures
- Signed volume volatility
- Tenkan-sen (Conversion Line): Fast momentum indicator (9-period midpoint)
- Kijun-sen (Base Line): Slow momentum indicator (20-period midpoint)
- TK Crossover: Momentum signal from line crossings
- Return position relative to Ichimoku lines
- EMA-based center line (10 and 20 periods)
- Upper/Lower bands at 2 standard deviations
- Band position indicator: Where does current return sit within the bands?
> 1: Breakout above upper band (overbought)< 0: Breakout below lower band (oversold)
- Volatility-based trailing stop indicator
- Distance from 20-day high/low normalized by ATR
- Measures trend strength and potential reversals
| Model | Type | Key Parameters |
|---|---|---|
| Ridge Regression | Linear | alpha = 0.01 |
| Random Forest | Ensemble | 100 trees, max_depth=32 |
| LightGBM | Gradient Boosting | Tuned via GridSearchCV |
Time-series aware cross-validation to prevent data leakage:
- Splits by dates, not random rows
- Ensures no future information leaks into training
- 5-fold cross-validation
| Model | CV Accuracy | Notes |
|---|---|---|
| Ridge Regression | ~50.5% | Linear baseline |
| Random Forest | ~51.7% | Ensemble approach |
| LightGBM | ~52.0% | Best performer |
In financial prediction, even small improvements above 50% can be highly valuable.
The most important features for prediction are:
- Recent returns (
RET_1,RET_2) - Bollinger Band position indicators
- Ichimoku crossover signals
- Rolling volatility measures
qrt-asset-allocation-performance-forecasting/
|
|-- benchmark_submission.ipynb # Main notebook (run this)
|
|-- Data/
| |-- X_train.csv # Training features (180K+ samples)
| |-- X_test.csv # Test features
| |-- y_train.csv # Training targets
| |-- sample_submission.csv # Submission format
|
|-- Predictions/
| |-- preds_ridge.csv # Ridge predictions
| |-- preds_rf.csv # Random Forest predictions
| |-- preds_lgbm_optimized.csv # LightGBM predictions (BEST)
|
|-- README.md # This file
|-- requirements.txt # Dependencies
|-- .gitignore # Git ignore rules
Python 3.8+# Clone the repository
git clone https://github.com/YOUR_USERNAME/qrt-asset-allocation-performance-forecasting.git
cd qrt-asset-allocation-performance-forecasting
# Install dependencies
pip install -r requirements.txtOpen and run the notebook:
jupyter notebook benchmark_submission.ipynbpandas>=1.3.0
numpy>=1.21.0
scikit-learn>=1.0.0
lightgbm>=3.3.0
seaborn>=0.11.0
matplotlib>=3.4.0
- Add neural network models (LSTM, Transformer)
- Implement ensemble stacking
- Feature selection using SHAP values
- Hyperparameter tuning with Optuna
- Add more technical indicators (RSI, MACD)
This project is licensed under the MIT License.
- QRT for the challenge
- LightGBM for the gradient boosting framework
- scikit-learn for ML utilities