Skip to content

PrepareData

karlosmid edited this page Sep 16, 2012 · 50 revisions

 

Download data set

Arguments

  • input directory
  • target directory/stock
  • number of days
  • date range

eg prepare_data.py -i nasdaq_100 -t QQQ -d 60 -r 2012-01-01,2012-03-30

Create training, validation & test sets

  • select days randomly from date range
  • Split data set randomly into 3 sets
  • 33% for each set
  • Don't load in CSV yet, just use dates

Load data

For each (training, validation & test) set:

  • Fetch 3 day lookback for each random day
  • only use days for which QQQ data is available
  • only use trading hours 09:30 - 16:00
  • load into a pandas DataFrame
  • please include OHLCV index for the DataFrame

You should end up with data frames looking something like this:

time open high low close
2006-01-01 09:41 28.1 28.1 28.1 28.1
2006-01-01 09:41 20.1 28.1 28.1 28.1

Pre-calcuations

For each day:

  • add percentage change close - close to each data frame.
  • calculate % liquidity 3 day lookback for each component (ie not QQQ)

Output

Save each set to a file; eg

  • data/train_2012_01_01_2012_02_15.h5
  • data/validate_2012_01_01_2012_02_15.h5
  • data/test_2012_01_01_2012_02_15.h5

Review and testing report

ReviewTestingReport

Clone this wiki locally