Notes from the Trenches,
How Good Models Go Bad
Statistical & Machine Learning Approaches to Investment Management May 2019
Agenda
 Common pitfalls & how to avoid them

Instability & Interpretation
 A Principal Components Example
 A Kalman Filter Example

More than one way to scramble an egg
 Features and interpretation over model selection
 A Focus on Implementation & Implied Assumptions
Common Pitfalls
The devil is in the details
*Courtesy of Caitlin Hudon https://twitter.com/beeonaposy
Avoiding Common Pitfalls
No magic formula, but there are some good guidelines
 Reproducing subresults in different environments (Excel & R/MatLab)
 Interpret, decompose, interpret

No replacement for time, experience, & care
 You will be rushed  It is okay to delay

Review your assumptions, then do it again
 Are there implicit assumptions?
 Can you realistically implement?
 Was all data available? revised? realistic lag time?
Instability & Interpretation
Stability & Interpretation normally go hand in hand

As humans, we are natural storytellers
 It is far too easy to append a narrative to data or output
 But if that output is unstable, how can our narrative be true?
 Ensuring our analysis is stable and consistent is one of the most important excercises we can carry out
 Let's look at two examples:
 Principal Component Analysis
 Kalman Filter & Neural Nets
PCA Intro 1
Principal Component Analysis  A Quick Primer
 In contrast to Linear Regression which finds the best
β
to fit , PCA find the best combination of
r
so that
and
 Like Linear Regression where betas are 1, moves the data instead

What are the principal components?
 The first principal component is the combination of data that explain the most variance and
 Every following component is the same, but also with the covariance between it and the previous component equal to zero
PCA Intro 2
Principal Component Analysis  What Is It Good For?
 Dimension Reduction & Clustering

In practice
 Check autocorrelation of returns
 Rotation usually is helpful
 Can be used/compared with Factor Models to gain intuition
Explain All This
With Just This
PCA Problems
PCA Notoriously Unstable, What Does It Mean?
Kalman Filter Intro 1
Kalman Filter  A Quick Primer
 The Kalman Filter is a model with many names
 First consider TimeSeries Model, the Vector AutoRegression
 The prediction for is a state, which helps prediction
 The state is sometimes considered a hidden regime or layer
Kalman Filter Intro 2
The importance of the transition matrix
 When using the Kalman Filter for timevarying models and states, the transition matrix helps stability of the state
 First consider two states
95%  5% 
10%  90% 
From
State
To
State
Kalman Filter Regimes
Kalman Filter  These States Can Describe Regimes
Unstable
More Complexity Adds Difficulty
Adding more states only makes it less stable
Using Theory With Regimes
Instead of allowing the model to drive theory we should guide it
Comparison With Neural Nets
A simple visual comparison
Hidden Markov Model
Inputs
Hidden States Selected on Probability of Fit According to Features
States Effect Output
Neural Net
Select Features That Describe Hidden Layer
Hidden Layers Chosen
Hidden Layers Effect Output
Comparison Of Output
This example comparison highlights importance of transition matrix
Implementation
Implementation
Let's take a step back and discuss implementation
 One of the most famous models in finance is the Shiller PE 10
*Data from EDS & Robert Shiller's website
A Simple Implementation
Implementation of this signal, on its face, should be easy
*Data from EDS & Robert Shiller's website
Quick Test
However, implementing a quick test, we don't see gains?
 This test implies free trading costs
*Data from EDS, Shiller & Bloomberg, courtesy of London Business School
So What Went Wrong Part 1
Implied Assumptions  If You Sell Equities, What Do You Buy?
 Bonds share the same relationship with the measure
Interesting Side Note, These Are Driven By Inlfationary Periods
*Data from EDS, Shiller & Bloomberg, courtesy of London Business School
So What Went Wrong Part 2
Didn't Check Assumptions
 Strong autocorrelation, breaks an underlying assumption of OLS
 Interpretation: A new portfolio created each month?
*Data from EDS & Shiller
More Than One Way To Scramble An Egg
Focus on Features, Intuition, & Implementation
Different Models Same Insight
Both MVO & OLS Lead to the Same Output
 Let's go through the spreadsheet here
Unfortunately, due to data restrictions I am unable to make this spreadsheet public. Please contact me for more details.
*Data from EDS & Bloomberg, courtesy of London Business School
From Slack L1 L2
Generalized Additive Models  Mixing Elastic Net with CrossValidation
 Similar to the Slack variable we have a clear intuition
 L1Norm: Make sure outliers don't drive model
 L2Norm: Which factors should we choose?
Using CrossValidation
CrossValidation is used to choose parameters
 The intuition behind this approach is sacrificing the 'best fit' for stability
 A New Goal: The least wrong, the most often
1st Iteration
2nd Iteration
10th Iteration
...
Full Training Data
Why Are GAMs Useful?
Rather than focusing on a new model/approach that is harder to interpret, GAMs provide a framework for robust traditional modelling
 At the end of the day, GAMs are just a special case of OLS
 But then, so are most models
 GAMs are a machine learning
framework to automate the model building process
 'Features are King'  but which ones to choose?
 These features should be interpretable, robust, and consistent
 Easier to understand the connection between theory and output
Conclusion
Main Takeaways
Pitfalls are easy to fall into and 'insights' can be deceptive
Focus on details & assumptions, interpretation, implementation, and consistency
 Avoid pitfalls through reproducibility, patience, and care
 Learn your tools and systems, check the details and data types
 By focusing on interpretation all other good habits follow
 What does this mean in terms of implementation?
 What are the underlying & implicit assumptions?
 Do I understand and can I explain the output?
 Is it consistent? Am I achieving my goals?
Frameworks Not Models
Progress in Machine Learning & A.I. have enabled many advances
Thinking of these in terms of frameworks will gain more benefits
 Building a model is great, but it isn't really a solution
 Successful value add will come by combining many tools and insights into a robust framework for decision making
 These tools are about automation & enablement
 Focus more on finding implementable & interpretable insights
 Look more broadly, more deeply than before
 This is fundamental
 Leads to more informed decision making
Why Bother With All This? Part 1
More informed decisions, in a more robust process, leads to better results
 Better outcomes, one example of many is in Hedge Funds
*Data from EDS & Bloomberg, courtesy of London Business School
Why Bother With All This? Part 2
Using A.I. in financial services is becoming a 'musthave'
 Aug 2017: 20% usage of A.I. or Machine Learning
 Aug 2018: 56% usage
Blackrock Data Science Core
Exploratory programs on machine learning
 A.I. based risk management
 Dynamic factor analysis using A.I.
 A.I. reconciling investment decisions
Concluding Remarks
More informed decisions, in a more robust process, leads to better results
 ML and AI aren't neccessarily a valueadd
 When combined in a strong framework the value adds are fundamental:
 Enabling you to work with a more holistic perspective
 Increasing efficiency and therefore productivity
 Allows you to focus more on insights & implementation
 Know your systems, big problems require an understanding of tools
 Interpretation is the realm of the human
 These tools are powerful, but if you can't understand the 'why' then they are little use to anyone
Disclaimers
