Streamdata.io
Efficiently Turn APIs into Real-time Experiences
@erichoresnyi
@abifet
Source: Tabb Group
High Frequency Trading
AUM>$1trn, source: Towers Watson
*Blackrock is actually headquartered in NY, main AUM coming from ETF/ passive originally BGI in SF
Approx 3xGDP in USA ie 155k$/hab
HFT context
HFT context
HFT context
Reg.ATS'98-Reg.NMS'05
HFT context
HFT context
Host in Network Nodes, then Exchanges
Dark Fiber
1,000 miles > 825 miles
14.5 ms > 11.5 ms
11.5 > 8.5ms
N:1.33 > 1.0003
v = c/n
Nanosecs
HFT <> Algo Trading
"Once you get into milliseconds it's almost not HFT any more"
Market Events: [ct,x,y,z]
HFT built a wormhole to win on [ct',x,y,z] events
AI builds a blackhole by massively processing [ct,x,y,z] events
The endgame of Determinism
∀ [ct,x,y,z] ∈ Rn ⊢ ∀ [ct',x',y',z']
Event Machine View
Loss aka Cost Function = J(θ) : distance points to line
Graph View : Regression
Features
Labels
$AAPL
$GOOG
.
$FB
Matrix view
Matrices of Weights
+35% yoy for 20 years : $2,500 > $1,000,000
PhD Mathematics, Berkeley - String Theory Chern-Simons Form
Follow the steepest slope, 100m+ features
α : Learning Rate, ∇ J : Gradient
Adapt weight to control error from previous layer's input, 150+ layers
Source: Neural Networks simulation by Matt Mazur at Emergent Mind
From Final Fantasy to Autonomous Car
"The implementation of streaming algorithms, typied by highly parallel computations with little reuse of input data, has been widely explored on GPUs."
1-Train
2-Validate
3-Test
Overfitting?
1-Train
2-Validate
3-Test
Overfitting?
Batch-based, finite training sets, static models
Infinite training sets, dynamic models
What is the largest number that we can store in 8 bits?
What is the largest number that we can store in 8 bits?
Inspect it only once (at most)
Sequence of examples > Error of a model
java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask
EvaluatePrequential
-l DecisionStump //training DecisionStump classifier ...
-s generators.WaveformGenerator //...on WaveformGenerator data
-n 100000 //using the first 100 thousand examples for testing
-i 100000000 //training on a total of 100 million examples
-f 1000000 //testing every one million examples
> dsresult.csv
Classification
Regression
Concept Drift
Sentiment Analysis
Stock Price
Alerting
learner.getVotesForInstance(instance)
learner.trainOnInstance(instance)
http://samoa-project.net
Public Stock Dataset
MOA Regression
Stock Price
Error
Sentiment Analysis
Alerts
Regression/Perceptron
Data Stream a compromise
ct
x,y,z
HFT
AI
Data Stream
Apache & Wikipedia Foundation : please donate!
MOA, Kaggle & Giphy : please contribute!
Books & Lectures
Data Stream Mining, MOA team
Yann LeCun Deep Learning Class, NYU
Matt Mazure, Emergent Mind & Andrew Ng, Coursera on AI
My Life as a Quant:Reflections on Physics&Finance, E.Derman
The Value of a Millisecond: Finding the Optimal Speed of a Trading Infra., TabbGroup
Flashboys, M.Lewis
Movies & Games
The Big Short, Back to the Future, Interstellar, The Black Hole,
Harry Potter, Rocky, Into the Mind, Star Wars, Matrix; Final Fantasy
By Streamdata.io
The race for low latency data continues. 10 years ago, Flashboys were helping HFT make money with low-latency infrastructures. Today, hedge funds build AI brains pumping hundreds of sources of data in real-time, seeking ubiquity to build Laplace Demons.