Skip to content

SelectModelIndex

Keith McDonnell edited this page Aug 16, 2012 · 3 revisions

 

Please apply the traing data generated in PrepareData to the kNN algorithm outlined in Programming Collective Intelligence, ch. 8 .

I've uploaded the source code for the book, including the kNN code you'll need to work with.

You might find scikit-learn a useful library. You can use the book examples, a standard library or roll your own.

In this case we'll be classifying instead of predicting a value.

Arguments

  • data set
  • k range
  • number of days
  • iterations

eg select_model_index.py -d train_2012_12...h5 -k 5,10 -i 10 -d 20

Note:

  • number of days: select N days randomly from data set
  • iterations: find average after N random trials

Fixed parameters:

  • number of stocks: 10
  • lookback period: 3

Find top 10 stocks by liquidty

Only use the top 10 stocks by liquidity for the previous three days. Where liqudity: Close * Volume

Distance metric

  • Euclidean

Please use a data frame with these fields:

Stock % change close % liquidity 1 min % liquidity 3 days
APPL 0.15 12.14 18.2
GOOG 0.22 8.93 12.11

Where:

% liquidity 1 min: ((volume(stock 1) * close(stock 1)) / (volume(stock 1..10) * close(stock 1..10))) * 100

Notes:

  • % change close: from last minute; ie data point
  • % liquidity 3 days; should already be calculated
  • % liquidity 1 min: from last minute

Long classifier

Where a QQQ data point is long if:

(next_close - current_close) >= 0.03 and (next_low - current_close) > -0.03

Short classifier

Where a QQQ data point is short if:

(next_close - current_close) <= -0.03 and (next_high - current_close) < 0.03

Output

eg select_model.py -d train_2012_12...h5 -k 5,10 -i 10 -d 20

data set: train_2012_12...h5
k range: 5,10
iterations: 10
days: 20
std dev error: 3.4%
avg error: 12.5%

Clone this wiki locally