Building the New York Times 4th Down Bot

whoami

Trey Causey
Data Scientist, ChefSteps

NFL Consultant

CSSS Certificate Holder

@treycausey

trey.causey@gmail.com

Kevin Quealy (@kevinq)
Deputy Editor
The Upshot / The New York Times

Josh Katz (@jshkatz)
Graphics Editor
The Upshot / The New York Times

Brian Burke (@bburkeESPN)
Senior Analytics Specialist
ESPN / Advanced Football Analytics

A Simple Statistical Model...

... deployed to production

... ready for any edge case

... fully automated

... a lesson in statistical decision-making

... "data journalism?"

The model is the easy part.

What is a fourth down?

Four attempts (downs) to advance
the ball 10 yards (for a new set of downs)
or score ("convert")

"4th and 2": 4th down, 2 yards remaining
for a new set of downs
If fail to advance, ball "turns over
on downs" to opponent
Options
- "Go for it"
- Punt
- Attempt a field goal

Do Firms Maximize? Evidence from Professional Football

Romer, David. 2006. Journal of Political Economy 114(2), pp. 340-365.

If coaches are rational actors, they should maximize their win
expectancy.
Relatively simply decisions
Very high compensation for coaches
Extremely competitive labor market
Learning is possible

... they don't.

"Conservative" playcalling is the overwhelming norm
Possible explanations
- Risk aversion over in-game win probability
- Principal-agent problems
- Systematic imperfect maximizers
- Complex objective
  functions

Romer's Approach

Play-by-play data
Dynamic programming
Expected payoff to punting / kicking / going for it
- Conditioned on game situation
Produces an expected points model
Demonstrates coaches significantly and
substantively fail to maximize their win
expectancy on 4th down.

Coaches lose their minds on 4th down.

Supporting work by Burke

Broader observations

Coaches don't win maximize, but delay losing as long as possible (Barnwell)
If this is true, behavior will not change.
If simply imperfect maximizers, data and learning via observation and trial & error should change behavior.

The 4th Down Bot Model(s)

Needed to make a decision

P(winning | pre-snap situation)

P(winning | successful conversion)

P(winning | failed conversion)

P(winning | successful field goal)

P(winning | failed field goal)

P(winning | punt)

Win Probability

Logistic regression (L2 penalty determined via cross-validation)

Dependent variable:

- Did team in this situation win the game? (not
expected points)

Inputs:

- Down, Distance, Yards to Go, Seconds Remaining

- Vegas line as a linear function of seconds remaining

- Score Difference, Offensive & Defensive Timeouts

- Quarter, (Score Difference * Quarter)

Training / Validation

Trained on 475,000+ plays
- 90% of 2001 - 2014 regular season plays
- No overtime plays
- Kickoffs, extra points, penalties removed
Five-fold cross-validation used for parameter estimation, selected model then fit to all 475,000+ plays
10% random holdout used for evaluation of predictive accuracy

Performance

Precision: 0.77

Recall: 0.78

F1 Score: 0.77

AUC: 0.86

Kicking / Punting

Wrinkle: kickers growing more accurate over time

Logistic regression controlling for
distance, kicker, weather, and stadium
(Katz)

Punting: expected net return from current
field position, non-punter specific (doesn't
allow for muffed punt)

Decision-making

For each game situation, estimate win
probability for each possible outcome.

Determine breakeven point -- probability
of successful 4th down conversion at which
coach should be indifferent between options.

be_{punt} = \frac{(P(win | punt) - P(win | fail))}{(P(win | success) - P(win | fail)}

be_{punt} = \frac{(P(win | punt) - P(win | fail))}{(P(win | success) - P(win | fail)}

be_{kick} = \frac{(P(win | FG\ attempt) - P(win | fail))}{(P(win | success) - P(win | fail)}

be_{kick} = \frac{(P(win | FG\ attempt) - P(win | fail))}{(P(win | success) - P(win | fail)}

Unsolved (statistical) problems

Estimation of uncertainty is problematic
- Which makes adjudication along the
  decision frontier difficult to automate
Obvious IID issues
Non-linear models (random forests,
gradient boosted trees) perform little
better, odd given the known
numerous interactions & non-linearities

Complications! (Overfitting?)

Lots of manual smoothing required for
situations that 'look' wrong.

End of game situations have
extremely high leverage &
visibility

Examples:

1) In fourth quarter, weight win probabilities of not going for it by the probability of getting the ball back.

2) With < 40 seconds left in game within FG range,
set WP to P(successful FG).

Model -> Code -> Product

Text

nyt4thdownbot.com

Every 4th down is analyzed
and explained with the
rationale for the call and
the breakeven point.

These charts ended up being
very popular!

The bot (@NYT4thDownBot)
has over 26,000 Twitter followers.

... but wait, there's more.

Fast, live data == Making calls before ball is snapped
For the 4th Down Bot to be a bot:
- Needs to be very fast
  - Receive data, process, make prediction, make decision, send decision, tweet decision in one second or so.
- Needs to be fully automated & always on
  - Load models & data into memory if not found
  - Logging
  - Graceful failure with fallback cases

API

Application Programming Interface
Stateless, accepts all arguments as
parameters in URL
Returns information needed to
tweet and render decision page as
JSON (JavaScript Object Notation)

http://someurl.com/predict?
dwn=4&ytg=27&yfog=73
&secs_left=3005&score_diff=7
&timo=3&timd=0&spread=0
&ou_offense=40
&key=APIKEY

Open source!

github.com/TheUpshot/4thdownbot-model

(API code is not open source for security purposes)

Lessons from building a data product

Modeling is a very small component
Data munging and packaging the model the majority of the work
Software development isn't statistics, but the intersection is powerful
http://uwseds.github.io/

The Stack

Python
- Data munging: pandas
- Modeling: scikit-learn
- Plotting: matplotlib
- API: Flask
JavaScript (Node.js)
- Kicking model