-
Notifications
You must be signed in to change notification settings - Fork 0
PrepareData
karlosmid edited this page Sep 16, 2012
·
50 revisions
- CSV format
- Input: Nasdaq 100 (1.4gb)
- Target: QQQ
- Save in data/nasdaq_100/
- input directory
- target directory/stock
- number of days
- date range
eg prepare_data.py -i nasdaq_100 -t QQQ -d 60 -r 2012-01-01,2012-03-30
- select days randomly from date range
- Split data set randomly into 3 sets
- 33% for each set
- Don't load in CSV yet, just use dates
For each (training, validation & test) set:
- Fetch 3 day lookback for each random day
- only use days for which QQQ data is available
- only use trading hours 09:30 - 16:00
- load into a pandas DataFrame
- please include OHLCV index for the DataFrame
You should end up with data frames looking something like this:
| time | open | high | low | close |
|---|---|---|---|---|
| 2006-01-01 09:41 | 28.1 | 28.1 | 28.1 | 28.1 |
| 2006-01-01 09:41 | 20.1 | 28.1 | 28.1 | 28.1 |
For each day:
- add percentage change close - close to each data frame.
- calculate % liquidity 3 day lookback for each component (ie not QQQ)
Save each set to a file; eg
- data/train_2012_01_01_2012_02_15.h5
- data/validate_2012_01_01_2012_02_15.h5
- data/test_2012_01_01_2012_02_15.h5