Building the New York Times 4th Down Bot
 

whoami

Trey Causey
Data Scientist, ChefSteps

NFL Consultant

CSSS Certificate Holder

 

@treycausey

trey.causey@gmail.com

Kevin Quealy (@kevinq)
Deputy Editor
The Upshot  / The New York Times

Josh Katz (@jshkatz)
Graphics Editor
The Upshot  / The New York Times

Brian Burke (@bburkeESPN)
Senior Analytics Specialist
ESPN / Advanced Football Analytics

A Simple Statistical Model...

 

... deployed to production

... ready for any edge case

... fully automated

... a lesson in statistical decision-making

... "data journalism?"

 

The model is the easy part.

What is a fourth down?
 

  • Four attempts (downs) to advance
    the ball 10 yards (for a new set of downs)
    or score ("convert")

    "4th and 2": 4th down, 2 yards remaining
    for a new set of downs
     
  • If fail to advance, ball "turns over
    on downs" to opponent
     
  • Options
    • "Go for it"
    • Punt
    • Attempt a field goal

Do Firms Maximize? Evidence from Professional Football

Romer, David. 2006. Journal of Political Economy 114(2), pp. 340-365.

 

  • If coaches are rational actors, they should maximize their win
    expectancy.
     
  • Relatively simply decisions
  • Very high compensation for coaches
  • Extremely competitive labor market
  • Learning is possible

... they don't.

  • "Conservative" playcalling is the overwhelming norm
     
  • Possible explanations
    • Risk aversion over in-game win probability
    • Principal-agent problems
    • Systematic imperfect maximizers
    • Complex objective
      functions

Romer's Approach

  • Play-by-play data
  • Dynamic programming
  • Expected payoff to punting  / kicking / going for it
    • Conditioned on game situation
  • Produces an expected points model
     
  • Demonstrates coaches significantly and
    substantively fail to maximize their win
    expectancy on 4th down.

Coaches lose their minds on 4th down.

Supporting work by Burke

Broader observations
 

  • Coaches don't win maximize, but delay losing as long as possible (Barnwell)
     
  • If this is true, behavior will not change.
     
  • If simply imperfect maximizers, data and learning via observation and trial & error should change behavior.

The 4th Down Bot Model(s)

 

Needed to make a decision

 

P(winning | pre-snap situation)
 

P(winning | successful conversion)

P(winning | failed conversion)
 

P(winning | successful field goal)

P(winning | failed field goal)
 

P(winning | punt)


 

Win Probability

Logistic regression (L2 penalty determined via cross-validation)

 

Dependent variable:

- Did team in this situation win the game? (not
expected points)
 

Inputs:

- Down, Distance, Yards to Go, Seconds Remaining

- Vegas line as a linear function of seconds remaining

- Score Difference, Offensive & Defensive Timeouts

- Quarter, (Score Difference * Quarter)

Training / Validation

  • Trained on 475,000+ plays
    • 90% of 2001 - 2014 regular season plays
    • No overtime plays
    • Kickoffs, extra points, penalties removed
       
  • Five-fold cross-validation used for parameter estimation, selected model then fit to all 475,000+ plays
     
  • 10% random holdout used for evaluation of predictive accuracy

Performance

Precision: 0.77

 

Recall: 0.78

 

F1 Score: 0.77
 

AUC: 0.86

Kicking / Punting

Wrinkle: kickers growing more accurate over time

 

Logistic regression controlling for
distance, kicker, weather, and stadium
(Katz)

 

Punting: expected net return from current
field position, non-punter specific (doesn't
allow for muffed punt)

Decision-making

For each game situation, estimate win
probability for each possible outcome.

 

Determine breakeven point -- probability
of successful 4th down conversion at which
coach should be indifferent between options.

be_{punt} = \frac{(P(win | punt) - P(win | fail))}{(P(win | success) - P(win | fail)}
bepunt=(P(winpunt)P(winfail))(P(winsuccess)P(winfail)be_{punt} = \frac{(P(win | punt) - P(win | fail))}{(P(win | success) - P(win | fail)}
be_{kick} = \frac{(P(win | FG\ attempt) - P(win | fail))}{(P(win | success) - P(win | fail)}
bekick=(P(winFG attempt)P(winfail))(P(winsuccess)P(winfail)be_{kick} = \frac{(P(win | FG\ attempt) - P(win | fail))}{(P(win | success) - P(win | fail)}

Unsolved (statistical) problems

  • Estimation of uncertainty is problematic
    • Which makes adjudication along the
      decision frontier difficult to automate
  • Obvious IID issues
  • Non-linear models (random forests,
    gradient boosted trees) perform little
    better, odd given the known
    numerous interactions & non-linearities 

Complications! (Overfitting?)

Lots of manual smoothing required for
situations that 'look' wrong.
 

End of game situations have
extremely high leverage & 
visibility


Examples:

1) In fourth quarter, weight win probabilities of not going for it by the probability of getting the ball back.


2) With < 40 seconds left in game within FG range,
set WP to P(successful FG).

Model -> Code -> Product

Text

nyt4thdownbot.com

Every 4th down is analyzed
and explained with the
rationale for the call and
the breakeven point.

 

These charts ended up being
very popular!

 

The bot (@NYT4thDownBot) 
has over 26,000 Twitter followers.

... but wait, there's more.

  • Fast, live data == Making calls before ball is snapped
  • For the 4th Down Bot to be a bot:
    • Needs to be very fast
      • Receive data, process, make prediction, make decision, send decision, tweet decision in one second or so.
         
    • Needs to be fully automated & always on
      • Load models & data into memory if not found
      • Logging
      • Graceful failure with fallback cases

API

  • Application Programming Interface
     
  • Stateless, accepts all arguments as
    parameters in URL
     
  • Returns information needed to
    tweet and render decision page as
    JSON (JavaScript Object Notation)

http://someurl.com/predict?
dwn=4&ytg=27&yfog=73
&secs_left=3005&score_diff=7
&timo=3&timd=0&spread=0
&ou_offense=40
&key=APIKEY

Open source!

github.com/TheUpshot/4thdownbot-model

 

(API code is not open source for security purposes)

Lessons from building a data product

  • Modeling is a very small component
     
  • Data munging and packaging the model the majority of the work
     
  • Software development isn't statistics, but the intersection is powerful
     
  • http://uwseds.github.io/

The Stack

  • Python 
    • Data munging: pandas
    • Modeling: scikit-learn
    • Plotting: matplotlib
    • API: Flask
       
  • JavaScript (Node.js)
    • Kicking model

Questions?

Made with Slides.com