Descisionmaking Beyond Prediction

Dhrumil Mehta

Associate Prof @ Columbia U. Graduate School of Journalism

Visiting Prof @ Harvard Kennedy School of Government

dhrumil.mehta@columbia.edu

@datadhrumil

@dmil

Guest Lecture @ Cornell Tech

"Off The Record" Please

Highlights:

Currently

Associate Prof. @ Columbia Graduate School of Journalism
Visiting Prof. @ Harvard Kennedy School

Previously

Database Journalist, Politics @ FiveThirtyEight
Software Development Engineer @ Amazon
Northwestern:
- BA in Philosophy + Minor in Cognitive Science
- MS in Computer Science

Database Journalist, Politics

http://fivethirtyeight.com/features/job-opening-database-journalist-politics/

data.fivethirtyeight.com

Themes

Identifying editorial decisions when doing data analysis

Communicating those decisions transparently and efficiently to readers
Communicating uncertainty to readers

Data Collection

Datasets in the classroom

Datasets at work

Build our own dataset

With Code / Scrapers
By Hand
By Survey Tool

Know your dataset

Find the point at which the data was collected
- Can you see the actual forms or artifacts from the point of data collection?
- Read any documentation available about the data.
- People to consider contacting:
  - Who collected the data?
  - Who is the data collected from/about?
  - Who are other users of this data?
  - Who is impacted by this data?
  - What are the limitations of this data collection?
Be able to trace every statistical treatment that was applied from the raw form of the data to the current form and answer questions from your editors about the nature of the data

Collecting Polling Data

Bots

Internal Workflows

An Individual Poll

Editorial Descision making

Do you keep the poll in the polling average or do your remove it?

https://www.surveyusa.com/client/PollReport.aspx?g=f054d152-ceac-48dc-a422-2f22c7a00521

https://twitter.com/baseballot/status/1437264848274997256

Aggregating Polls

Weighting polls by the historical accuracy of their pollster

But...How do we define historical accuracy?

Methodology

Polls from last 21 days prior to any election* since 1998
What should we take into account?

Senate, House, Governor, Presidential and Pres Primary

Methodology

Step 1: Collect and classify polls
Step 2: Calculate simple average error
Step 3: Calculate Simple Plus-Minus
Step 4: Calculate Advanced Plus-Minus
Step 5: Calculate Predictive Plus-Minus

Step 3: Calculate Simple Plus-Minus

Step 4: Calculate Advanced Plus-Minus

PLUS a few other things...

Step 5: Calculate Predictive Plus-Minus

Accounts for other markers of quality like methodological standards (NCPP/AAPOR/Roper membership) and whether or not they call cell phones

But if the end goal is to know as much as we can about the state of an election, polls don't tell us everything...

Forecasting Elections

Communicating Probabilistic Data

Communicating Uncertainty

2020

2018

2016

2014

Communicating Uncertainty

Whether we show the chances in percentages or odds, this is the portion of an election forecast that is most anticipated — and has the most potential to be misunderstood. In 2016, we aimed for simplicity, both visually and conceptually. In 2018, we leaned into the complexity of the forecast. For 2020, we wanted to land somewhere in between.

https://fivethirtyeight.com/features/how-we-designed-the-look-of-our-2020-forecast/

Communicating Uncertainty

https://fivethirtyeight.com/features/how-we-designed-the-look-of-our-2020-forecast/

Communicating Uncertainty

https://fivethirtyeight.com/features/how-we-designed-the-look-of-our-2020-forecast/

https://fivethirtyeight.com/features/how-fivethirtyeights-2020-forecasts-did-and-what-well-be-thinking-about-for-2022/

Bots

Lets readers see results that FiveThirtyEight deems unexpected

Expectations are calibrated before results ever start coming in.

Descisionmaking Beyond Prediction

Dhrumil Mehta

"Off The Record" Please

Highlights:

data.fivethirtyeight.com

Data Collection

Build our own dataset

Collecting Polling Data

Bots

An Individual Poll

An Individual Poll

Aggregating Polls

Forecasting Elections

2018

Communicating Probabilistic Data

Communicating Uncertainty

2020

2018

2016

2014

Communicating Uncertainty

Communicating Uncertainty

Communicating Uncertainty

Bots

People, Data, and Systems

People, Data, and Systems

Dhrumil Mehta

Descisionmaking Beyond Prediction

Dhrumil Mehta

"Off The Record" Please

Highlights:

data.fivethirtyeight.com

Data Collection

Build our own dataset

Collecting Polling Data

Bots

An Individual Poll

An Individual Poll

Aggregating Polls

Forecasting Elections

2018

Communicating Probabilistic Data

Communicating Uncertainty

2020

2018

2016

2014

Communicating Uncertainty

Communicating Uncertainty

Communicating Uncertainty

Bots

People, Data, and Systems

More from Dhrumil Mehta