-
Notifications
You must be signed in to change notification settings - Fork 0
SelectModelIndex
Please apply the traing data generated in PrepareData to the kNN algorithm outlined in Programming Collective Intelligence, ch. 8 .
I've uploaded the source code for the book, including the kNN code you'll need to work with.
You might find scikit-learn a useful library. You can use the book examples, a standard library or roll your own.
In this case we'll be classifying instead of predicting a value.
- data set
- k range
- number of days
- iterations
eg select_model_index.py -d train_2012_12...h5 -k 5,10 -i 10 -d 20
Note:
- number of days: select N days randomly from data set
- iterations: find average after N random trials
Fixed parameters:
- number of stocks: 10
- lookback period: 3
Only use the top 10 stocks by liquidity for the previous three days. Where
liqudity: Close * Volume
- Euclidean
Please use a data frame with these fields:
| Stock | % change close | % liquidity 1 min | % liquidity 3 days |
|---|---|---|---|
| APPL | 0.15 | 12.14 | 18.2 |
| GOOG | 0.22 | 8.93 | 12.11 |
Where:
% liquidity 1 min: ((volume(stock 1) * close(stock 1)) / (volume(stock 1..10) * close(stock 1..10))) * 100
Notes:
- % change close: from last minute; ie data point
- % liquidity 3 days; should already be calculated
- % liquidity 1 min: from last minute
Where a QQQ data point is long if:
(next_close - current_close) >= 0.03 and (next_low - current_close) > -0.03
Where a QQQ data point is short if:
(next_close - current_close) <= -0.03 and (next_high - current_close) < 0.03
eg select_model.py -d train_2012_12...h5 -k 5,10 -i 10 -d 20
data set: train_2012_12...h5 k range: 5,10 iterations: 10 days: 20 std dev error: 3.4% avg error: 12.5%