Building the New York Times 4th Down Bot
whoami
Trey Causey
Data Scientist, ChefSteps
NFL Consultant
CSSS Certificate Holder
@treycausey
trey.causey@gmail.com
Kevin Quealy (@kevinq)
Deputy Editor
The Upshot / The New York Times
Josh Katz (@jshkatz)
Graphics Editor
The Upshot / The New York Times
Brian Burke (@bburkeESPN)
Senior Analytics Specialist
ESPN / Advanced Football Analytics
A Simple Statistical Model...
... deployed to production
... ready for any edge case
... fully automated
... a lesson in statistical decision-making
... "data journalism?"
The model is the easy part.
What is a fourth down?
- Four attempts (downs) to advance
the ball 10 yards (for a new set of downs)
or score ("convert")
"4th and 2": 4th down, 2 yards remaining
for a new set of downs
- If fail to advance, ball "turns over
on downs" to opponent
- Options
- "Go for it"
- Punt
- Attempt a field goal
Do Firms Maximize? Evidence from Professional Football
Romer, David. 2006. Journal of Political Economy 114(2), pp. 340-365.
- If coaches are rational actors, they should maximize their win
expectancy.
- Relatively simply decisions
- Very high compensation for coaches
- Extremely competitive labor market
- Learning is possible
... they don't.
- "Conservative" playcalling is the overwhelming norm
- Possible explanations
- Risk aversion over in-game win probability
- Principal-agent problems
- Systematic imperfect maximizers
- Complex objective
functions
Romer's Approach
- Play-by-play data
- Dynamic programming
- Expected payoff to punting / kicking / going for it
- Conditioned on game situation
- Produces an expected points model
- Demonstrates coaches significantly and
substantively fail to maximize their win
expectancy on 4th down.
Coaches lose their minds on 4th down.
Supporting work by Burke
Broader observations
- Coaches don't win maximize, but delay losing as long as possible (Barnwell)
- If this is true, behavior will not change.
- If simply imperfect maximizers, data and learning via observation and trial & error should change behavior.
The 4th Down Bot Model(s)
Needed to make a decision
P(winning | pre-snap situation)
P(winning | successful conversion)
P(winning | failed conversion)
P(winning | successful field goal)
P(winning | failed field goal)
P(winning | punt)
Win Probability
Logistic regression (L2 penalty determined via cross-validation)
Dependent variable:
- Did team in this situation win the game? (not
expected points)
Inputs:
- Down, Distance, Yards to Go, Seconds Remaining
- Vegas line as a linear function of seconds remaining
- Score Difference, Offensive & Defensive Timeouts
- Quarter, (Score Difference * Quarter)
Training / Validation
- Trained on 475,000+ plays
- 90% of 2001 - 2014 regular season plays
- No overtime plays
- Kickoffs, extra points, penalties removed
- Five-fold cross-validation used for parameter estimation, selected model then fit to all 475,000+ plays
- 10% random holdout used for evaluation of predictive accuracy
Performance
Precision: 0.77
Recall: 0.78
F1 Score: 0.77
AUC: 0.86
Kicking / Punting
Wrinkle: kickers growing more accurate over time
Logistic regression controlling for
distance, kicker, weather, and stadium
(Katz)
Punting: expected net return from current
field position, non-punter specific (doesn't
allow for muffed punt)
Decision-making
For each game situation, estimate win
probability for each possible outcome.
Determine breakeven point -- probability
of successful 4th down conversion at which
coach should be indifferent between options.
Unsolved (statistical) problems
- Estimation of uncertainty is problematic
- Which makes adjudication along the
decision frontier difficult to automate
- Which makes adjudication along the
- Obvious IID issues
- Non-linear models (random forests,
gradient boosted trees) perform little
better, odd given the known
numerous interactions & non-linearities
Complications! (Overfitting?)
Lots of manual smoothing required for
situations that 'look' wrong.
End of game situations have
extremely high leverage &
visibility
Examples:
1) In fourth quarter, weight win probabilities of not going for it by the probability of getting the ball back.
2) With < 40 seconds left in game within FG range,
set WP to P(successful FG).
Model -> Code -> Product
Text
nyt4thdownbot.com
Every 4th down is analyzed
and explained with the
rationale for the call and
the breakeven point.
These charts ended up being
very popular!
The bot (@NYT4thDownBot)
has over 26,000 Twitter followers.
... but wait, there's more.
- Fast, live data == Making calls before ball is snapped
- For the 4th Down Bot to be a bot:
- Needs to be very fast
- Receive data, process, make prediction, make decision, send decision, tweet decision in one second or so.
- Receive data, process, make prediction, make decision, send decision, tweet decision in one second or so.
- Needs to be fully automated & always on
- Load models & data into memory if not found
- Logging
- Graceful failure with fallback cases
- Needs to be very fast
API
- Application Programming Interface
- Stateless, accepts all arguments as
parameters in URL
- Returns information needed to
tweet and render decision page as
JSON (JavaScript Object Notation)
http://someurl.com/predict?
dwn=4&ytg=27&yfog=73
&secs_left=3005&score_diff=7
&timo=3&timd=0&spread=0
&ou_offense=40
&key=APIKEY
Open source!
github.com/TheUpshot/4thdownbot-model
(API code is not open source for security purposes)
Lessons from building a data product
- Modeling is a very small component
- Data munging and packaging the model the majority of the work
- Software development isn't statistics, but the intersection is powerful
- http://uwseds.github.io/
The Stack
- Python
- Data munging: pandas
- Modeling: scikit-learn
- Plotting: matplotlib
- API: Flask
- JavaScript (Node.js)
- Kicking model
Questions?
Building the NYT 4th Down Bot
By Trey Causey
Building the NYT 4th Down Bot
- 2,664