Intraday Models

Problem Definition

The goal is to predict the price change of a security.

At each point in the above graph, we want to predict the prices after 1 min.

Problem Definition

The following table shows the label for AAPL on 20170928.

Timestamp (EST)	Current Price	Price Difference
09:30:01.500000	153.905	0.125
09:30:01.550000	153.905	0.125
09:30:01.600000	153.9	0.13
09:30:01.650000	153.905	0.13
09:30:01.700000	153.905	0.13

Dataset

Training Period: 20170901 to 20170915 (10 days)
Testing Period: 20170918 to 20170922 (5 days)
Numpy Array Shape: (2026106, 361) - before adding transformations
Source securities: 'SPY', 'BND', 'XLK', 'XLU', 'VNQ', 'IEMG', 'AGG', 'VWO', 'XLF', 'EEM', 'QQQ', 'USO', 'GDX', 'IWM', 'VEA'
Label mean: 0.0002, standard deviation: 0.0216 (This helps put the MAE values in context)

Features

bid/ask/mid prices
bid/ask sizes
bid-ask spread
bid/ask/mid moving averages over 1 min and 5 min
bid/ask/mid trend over 1 min and 5 min
volume/trades/updates over 1 min and 5 min

Normalization

0-1 normalization
0-1 normalization and division by mean

Linear Regression

fit_intercept=False
Unnormalized, 0-1 normalized and 0-1 mean normalized

LR Results

	InSample	OutSample
MSE	0.000242089	43997.8
MAE	0.0118442	151.201
Original stdev	0.0159	0.0202
Predicted stdev	0.0108	179.1057

Simple LR on all features does not work.

L2 penalized LR Results

	InSample	OutSample
MSE	0.000219	0.000557
MAE	0.01101	0.01722
Original stdev	0.0159	0.0202
Predicted stdev	0.0050	0.0121

Trained using ElasticNet learning algorithm.
All 360 features are selected.
Out of Sample stdev is good.

Lasso Results

	InSample	OutSample
MSE	0.000244	0.000412
MAE	0.01113	0.01313
Original stdev	0.0159	0.0202
Predicted stdev	0.0020	0.0018

Selects 11 features out of 360
fit_intercept=False
alphas: 1e-3, 1e-4, 1e-5

L1 + L2 penalties

	InSample	OutSample
MSE	0.000244	0.000412
MAE	0.01114	0.01311
Original stdev	0.0159	0.0202
Predicted stdev	0.0020	0.0017

Selects 19 features out of 360
fit_intercept=False
alphas: 1e-3, 1e-4, 1e-5
l1_ratio = 0.1 (l1/l2)

Pre-selecting features

Using standard feature selectors by themselves do not seem to work.
Using PCA to reduce the number of features does not seem to make much difference.

Using Less Data

Penalized LR results are same as long as there are >= 2000 datapoints representing most of the time periods.
In other words, reducing the number of datapoints by 512 is still okay.

Using Less Data

Penalized LR results are same as long as there are >= 2000 datapoints representing most of the time periods.
In other words, reducing the number of datapoints by 512 is still okay.

Ensemble Methods

Checked RandomForest, AdaBoosting and GradientBoosting algorithms.
They are better at preserving the stdev and generalize relatively better out of sample.

Algorithm	Stdev	OutSample MAE
Gradient Boosting	0.0075 (0.0038)	0.01583
AdaBoosting	0.0037 (0.0058)	0.01345
RandomForest	0.0236 (0.0149)	0.02638

FNN

Feed forward neural networks are relatively worse in terms of generalization.

Lessons (re-)learned

Pure market data features have very limited predictive abilities. (They can be exploited a bit more by adding a regime component on top.)
Learning algorithms and algorithm parameters that preserve the standard deviation scale are more important than small differences in MSE/MAE scores.
Out of sample generalization is non-obvious and important.

Intraday Models

By Hardik Patel

Intraday Models

1,133

Hardik Patel