SelectModelIndex

Please apply the traing data generated in PrepareData to the kNN algorithm outlined in Programming Collective Intelligence, ch. 8 .

I've uploaded the source code for the book, including the kNN code you'll need to work with.

You might find scikit-learn a useful library. You can use the book examples, a standard library or roll your own.

In this case we'll be classifying instead of predicting a value.

Arguments

data set
k range
number of days
iterations

eg select_model_index.py -d train_2012_12...h5 -k 5,10 -i 10 -d 20

Note:

number of days: select N days randomly from data set
iterations: find average after N random trials

Fixed parameters:

number of stocks: 10
lookback period: 3

Find top 10 stocks by liquidty

Only use the top 10 stocks by liquidity for the previous three days. Where liqudity: Close * Volume

Distance metric

Euclidean

Please use a data frame with these fields:

Stock	% change close	% liquidity 1 min	% liquidity 3 days
APPL	0.15	12.14	18.2
GOOG	0.22	8.93	12.11

Where:

% liquidity 1 min: ((volume(stock 1) * close(stock 1)) / (volume(stock 1..10) * close(stock 1..10))) * 100

Notes:

% change close: from last minute; ie data point
% liquidity 3 days; should already be calculated
% liquidity 1 min: from last minute

Long classifier

Where a QQQ data point is long if:

(next_close - current_close) >= 0.03 and (next_low - current_close) > -0.03

Short classifier

Where a QQQ data point is short if:

(next_close - current_close) <= -0.03 and (next_high - current_close) < 0.03

Output

eg select_model.py -d train_2012_12...h5 -k 5,10 -i 10 -d 20

data set: train_2012_12...h5
k range: 5,10
iterations: 10
days: 20
std dev error: 3.4%
avg error: 12.5%

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SelectModelIndex

Arguments

Find top 10 stocks by liquidty

Distance metric

Long classifier

Short classifier

Output

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally