Ryan Hafen
Hafen Consulting, LLC and Purdue University
Michael Kane
Yale University
R/Finance Conference
May 20, 2016
bit.ly/rfinance2016
DARPA XDATA Program
bit.ly/rfinance2016
bit.ly/rfinance2016
bit.ly/rfinance2016
But it's challenging enough to understand the data!
It's dangerous to just go out and start applying algorithms to the data when we don't understand all the variables and how they should be handled, etc.
bit.ly/rfinance2016
bit.ly/rfinance2016
Guiding Principles:
bit.ly/rfinance2016
Many other "big data" systems that support R are either:
Idea of Tessera:
bit.ly/rfinance2016
Specification of data partitioning and data structures and ad hoc analysis
Data structures
Raw equity data partitioned by symbol and date
Raw equity and option data grouped by symbol and date
Ad hoc application of R code
Higher resolution summaries
Outlier / anomaly detection
Cointegration calculations
datadr / RHIPE
bit.ly/rfinance2016
Interactively investigate data in detail with several different displays
Example:
Plot price vs. time for data partitioned by symbol/day
There are ~1 million subsets based on this partitioning
Compute cognostics - metrics that allow us to navigate this large space of displays in a meaningful way
Trelliscope
bit.ly/rfinance2016
Assess market-wide systemic risk with cointegration
Investigate option prices leading equity prices as indicator of insider trading
Examine spikes in implied volatility and volume in options to look for fiscal malfeasance
AAPL price (top) and cointegration measure (bottom) S&P 500 on the 05/06/2010 flash crash
We are now in good shape to begin more interesting analyses
bit.ly/rfinance2016
Reference:
Tessera: http://tessera.io
rbokeh: http://hafen.github.io/rbokeh
More: http://ryanhafen.com
@hafenstatsbit.ly/rfinance2016