(1) Join this Slack:


join and star the following channels:

#data-ms-2023, #reporting-ii-2023


(2) Fill out the Polly survey

In the #reporting-ii-2023 Slack channel


If you have any questions, just raise your hand 🖐!

Welcome! Let's get rolling!



Hello,  my name is...


Dhrumil Mehta (he/him)

Associate Prof. of Journalism @ Columbia U.

Deputy Director of Tow Center

Visiting Associate Prof of Public Policy @ Harvard Kennedy School







You will meet Prof. Denise Ajiri next week!





- Meet Dhrumil

- Meet each other! (Introductions & Survey Responses)

- Syllabus Overview


- Data Journalism: Possibilities And Limitations

- Intro to Descriptive Statistics


- Homework Overview




  • Associate Prof. @ Columbia Graduate School of Journalism

  • Visiting Associate Prof. @ Harvard Kennedy School



  • Database Journalist, Politics @ FiveThirtyEight

  • Software Development Engineer @ Amazon
  • Northwestern:
    • BA in Philosophy + Minor in Cognitive Science
    • MS in Computer Science

Database Journalist, Politics

Data-Driven Storytelling




Data Scraping / Cleaning


Internal Workflows



Lets readers see results that FiveThirtyEight deems unexpected

Expectations are calibrated before results ever start coming in.

Open Data


Quantitative Editing


Computationally analyzing text to better understand media and political environments.

I have a research interest in text analysis

Let me tell you what kind of editor I am...


- what I'm good at

- what I'm working to improve

That's me!
But who are you?

Survey Responses


Live Coding

(unless I get nervous or we are short on time)


Pay special attention to:

- [ ] The questions I ask of the dataset

- [ ] What I do when I don't know some code or forget how to do something

- [ ] What statistical or visual treatments I chose to apply and why

Now your turn!!!

Take a second to write down (digitally) a bit about who you are! (bullet points...they don't have to be legible to anyone but yourself)

  • What motivates you to study data journalism?
  • What are your journalistic interests as we start thinking about forming project groups...
    • topics you'd like to work on
    • skills you bring to a group project
    • skills you'd like to build / pick up from your group-mates

But there's a catch!

You have 5 minutes to create a 5-questions to get to know one other person in the room who you will have to introduce to the class



You will be split into:

1) Survey Makers

2) Interview Takers


Survey Makers (5 question survey)

Multiple Choice

Which of the following reporting topics interests you most? (1) Healthcare (2) Education (3) Agriculture


On a scale of 1-5, with 5 being most sure. How sure are you about what your thesis project will be about?


Are you an interested in New York City issues?

1-phrase Answer

Where did you grow up?




Interivew Takers


5 open-ended interview questions

Step 1: Send your questions to your partner via Slack








Step 2: Answer the questions that were sent to you via Slack

Tell us a story about your partner:

  • What motivates them to study data journalism...
  • What are their journalistic interests as we start thinking about forming project groups...beats you'd like to work on, skills you'd like to learn etc...

What have we learned?

Intro to Stats for Journalists

Summary Stats





Ben Orlin — Math with bad drawings

Ben Orlin — Math with bad drawings

Weighted Average






Ben Orlin — Math with bad drawings

Ben Orlin — Math with bad drawings






Ben Orlin — Math with bad drawings

Ben Orlin — Math with bad drawings


Ben Orlin — Math with bad drawings

Ben Orlin — Math with bad drawings




Pearson's Correlation Coefficient

Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there'.


Ben Orlin — Math with bad drawings

Ben Orlin — Math with bad drawings

Correlation & Causation


Standard Deviation























Ben Orlin — Math with bad drawings

Ben Orlin — Math with bad drawings

Mystery Data



What can you tell about these 4 mystery datasets with summary statistics?



- Mean

- Median

- Mode

- Correlation

- Variance


Anascombe's Quartet

Datasauraus Dozen



Normal Distribution

Pay Attention To Distribution Of Data


Exploratory Data Visualization

Pearson correlation is ????

Pearson correlation is 0.9909

because there are 40 duplicate data pts in top right and bottom left corner

Editorial Choices


Homework Review

Dataset?      or a

I have a dataset I'm interested in


I have a journalistic question that I'm interested in trying to answer

Does your pitch start with a:

What questions will I ask?

Where will I get the data?

Types of Data Stories

Dataset?      or a

I have a dataset I'm interested in


I have a journalistic question that I'm interested in trying to answer

Does your pitch start with a:

What questions will I ask?

Where will I get the data?

Counting Stuff

Answer a question with data

Support/Oppose a hypothesis

Identify a Phenomenon

Identify a Phenomenon

Debunk or Justify Conventional Wisdom

Data-Driven Profile

Lack of Data

Data driven investigative work

Dig for Data

Provide relevant context

Build our own dataset

  • With Code / Scrapers
  • By Hand
  • By Survey Tool

Archiving Data

Explain Calculations

Use Innovative Methodology

Use data to inform traditional reporting

The Rare Datapoint

Challenging Official Data

Huge Data Dump

  • Uber
  • Election Results
  • Census