Dhrumil Mehta

Database Journalist, Politics - FiveThirtyEight

Adjunct Lecturer in Public Policy - Harvard Kennedy School

 

dhrumil.mehta@fivethirtyeight.com  

 @datadhrumil

@dmil

 

  1. Text as Data:

    Computational analysis of text...

    and it's uses in Journalism!
     
  2. My Work | Career Talk
     
  3. Teaching Technology

 

Computational analysis of text...

and it's uses in Journalism!

 

by: Dhrumil Mehta

Text as Data

A Few Examples

The Media Really Has Neglected Puerto Rico

The Media Really Started Paying Attention To Puerto Rico When Trump Did

 

Apple says its App Store is ‘a safe and trusted place.’ We found 1,500 reports of unwanted sexual behavior on six apps, some targeting minors.

He was caught on video, but Georgia doctor kept his medical license

A small group of lawyers and its outsized influence on the U.S. Supreme Court

The Echo Chamber

Why is this interesting?

Because AI?

Title Text

Because it is another tool for

empirically guided inquiry,

 

 

which is common to Journalism and Social Science.

Data journalism:

 


"Quantitative social science...on deadline."

 

- Andrew Flowers (former 538 quantitative editor)

 

But the "on deadline" part can be tricky...

 

Which is partly why methodological innovation is so hard to do in newsrooms, and often the new and innovative methodologies are limited to large newsrooms with time and space to experiment or with the power to put resources behind large enterprise projects.

 

 

... and even then we're often leaning on the work of our colleagues.

 

  • What has it been used for in Social Science?

  • What is it not good for?

  • What are the pitfalls?

Using a new methodology:

 

  • What has/could it be used for in journalism?
  • What are the pitfalls specifically to journalistic inquiry?

  • How do we edit a story that uses this methodology?

  • How do we communicate the methodology readers?

  • How do we communicate the pitfalls to readers?

 

Example 1:

 

"Research shows that minority candidates can be successful in drawing out co-ethnic minority voters....

 

 

...but it is difficult to draw any conclusions from the research about national elections, in which partisanship is a much stronger force."

Empirically guided inquiry wasn't possible using our go-to methods:

 

 

There is only one poll I know of that breaks out a "South Asian American" cross-tab...

 

 

...and it comes out once every few years.

 

 

During Jindal’s first gubernatorial campaign, in 2003, South Asian-Americans donated an estimated $667,000, or 19 percent of the $3.5 million he raised from individual donations.

During his much more expensive 2007 and 2011 campaigns, that figure dropped by about half and made up only about 4 percent of the approximately $8 million that he raked in from individual donors during both of those campaigns.

2017

2019

  • What has it been used for in Social Science?

  • What is it not good for?

  • What are the pitfalls?

Using a new methodology:

 

  • What has/could it be used for in journalism?
  • What are the pitfalls specifically to journalistic inquiry?

  • How do we edit a story that uses this methodology?

  • How do we communicate the methodology readers?

  • How do we communicate the pitfalls to readers?

 

It's hard for newsrooms to do this kind of work regularly without a template for it.

  • There is only so much methodological innovation our editorial process can handle. The edit-burden for introducing a new methodology has to be justified by the importance or the story due to deadline / bandwidth constraints.
     
  • There is an "uncanny valley" of stories that could push the envelope in terms of methodology that can't get done because they require more innovation than can be justified (the first time).

Journalism Schools are well-placed to pioneer methodological innovation.

  • Co-location with academics of from disciplines makes taking inspiration from other disciplines easier
     
  • It's exciting work for student projects (and makes for great portfolio pieces)!
     
  • Students often spend several weeks on a project or two...there is room for iteration.
     
  • J-schools are well placed to take knowledge from across newsrooms who are doing this kind of work, and consolidate / explain / democratize / "templatize" it for newsrooms.

 

And interdisciplinary collaboration is a two-way street...

APSA 2014

Applications And Advances In Text Analysis And Machine Learning

MPSA 2020

Political Communication: Methods

That's one of the things I'd like to do if I were at Columbia...

Work across disciplines to identify methods of empirical inquiry that could be useful to journalists. Work with students on interesting stories using those methods and then make those methods accessible to newsrooms.

 

 

And while it's important for students to participate in innovation...

 

 

...it is equally important for them to be trained in the fundamentals of computational journalism.

Path

  • Northwestern University:
    • BA in Philosophy + Minor in Cognitive Science
    • MS in Computer Science
    • Knight Lab Student Fellow
       
  • Software Development Engineer @ Amazon
     
  • Database Journalist, Politics @ FiveThirtyEight
     
  • Adjunct Lecturer in Public Policy @ Harvard Kennedy School

Database Journalist, Politics

FiveThirtyEight

Databases

Reporting and Writing

Public Opinion and Polling

Media

Other stuff...

Data Editing

Open Data

https://www.datajournalismawards.org/project-listing/?project_id=2082

Quantitative Editing

Words Editing

I edit POLL-BOT

... which is actually just 538's politics intern

 

 

 

 

 

 

 

 

 

(more about POLL-BOT later)

 

Bots

 

Bots let humans do what they're good at

2016

2018

Bot reports the facts, leaving time for humans to interpret them.

2018

But the bot also helps interpret facts!

2018

Lets readers see results that FiveThirtyEight deems unexpected

Expectations are calibrated before results ever start coming in.

2020

Bot evolves into a human...

...jk

Internal Bots

Generalized Bot Architecture

C+J Conference @ Stanford (2016)

More Complex Bots

Scraping

Visualization

Teaching Tech

Lessons from Policy School

Harvard Kennedy School

Democracy and Technology Fellow (Ash Center for Democratic Governance and Innvation)

 

 

DPI-691M | Programming and data for Policymakers

 

                   

The Government is just a series of CRUD* applications that interact with each other.




 

*CRUD = Create Read Update Delete

 

 


- David Zvenyach (former Executive Director of 18F)

Avoiding Disaster

 

Cutting through red-tape and building stuff that works...

Bringing best-practices from software development into government.

Understanding enough about technology to not be fooled by people selling bad technology or charging too much for too little...

... or perpetuating bad ideas about technology like "security through obscurity"

 

Not deferring technical decisions to "technical people"

Unlocking Quant Skills

Data in the Classroom

Data in Journalism

Text

New research from the University of Washington finds that a natural aptitude for learning languages is a stronger predictor of learning to program than basic math knowledge, or numeracy.

- University of Washington News

March 2, 2020

DPI-691M: Programming and Data for Policymakers

(aka. #code4policy)

 

Data and code are no longer just for programmers. Policymakers in the 21st century, from members of congress to analysts and executives need to be equipped with the necessary skills to navigate nuanced issues at the intersection of technology and governance.

 

Those who have first hand experience with programming, data, software development and management, methods, open source collaboration, and technology innovation are better prepared to competently navigate these issues.

Curriculum Design

Programming and data for Policymakers

(#code4policy)

  • Not just "learning a technology" but "Learning how to learn new technologies."

 

  • Project-based learning

 

Learning Objectives:

"Learning how to learn new technologies."

 

  • Having a mental-model of what is going on rather than blindly entering commands
     
  • Reading technical documentation.

 

  • Logically formulating a question when you're stuck based on your mental model of what is happening

Lesson Planning

  • Non-project lessons are strictly tethered to learning objectives.
     
  • Lessons are built to promote conceptual understanding
     
  • ...or to train "muscle memory" for good habits

"Learning how to learn new technologies."

Text

Dhrumil's Classroom

Sticky Notes Everywhere!

Student Feedback

  • Daily Standup Meeting (class & projects)
    • What have you done since the last class?
    • What do you plan to do between now and the next class?
    • Any blockers?
       
  • Sticky Notes for Workshops
    • Blue when you're done, or feeling a sense of mastery
    • Red when you're stuck, or feeling lost
       
  • Parking Lot Questions

Learning Together

  • Coding "I do" vs "You Do"
  • Pair programming
    • Driver - Writing the code
    • Navigator - Figuring out what to do next
  • Project-based learning
  • An active Slack group...

Inclusion

  • Fear of technology
  • Systems like Blue / Red sticky notes in place to promote inclusion...no student should feel this is not for them.
"Dhrumil did a fantastic job of not only lining up great speakers for the class, but achieving gender parity among his guests - an impressive feat in the male-dominated tech sphere!" 
---
This class gave me more confidence to pursue possible jobs in being a bridge between technology teams and social service delivery. It also sparked my interest in mastering some skills in the course we didn't get to practice but seem valuable, like scraping for data or utilizing and API.
---
I gained confidence that I am able to get to grips with new technologies/ software on my own and tools for doing so in a systematic way.

Rigor

This was a great class and I really enjoyed taking it! I had to work surprisingly hard, but felt like that learning was purposeful. I feel proud of my final project and am grateful to have had the opportunity to work on it.

---

 

 

Coding Is Journalism

 

dhrumil.mehta@fivethirtyeight.com  

 @datadhrumil

@dmil

 

http://fivethirtyeight.com/contributors/dhrumil-mehta/​

Columbia Job Talk

By Dhrumil Mehta

Columbia Job Talk

Telling stories with data.

  • 338