Dhrumil Mehta
Database Journalist, Politics - FiveThirtyEight
Adjunct Lecturer in Public Policy - Harvard Kennedy School
dhrumil.mehta@fivethirtyeight.com
@datadhrumil
@dmil
Computational analysis of text...
and it's uses in Journalism!
Computational analysis of text...
and it's uses in Journalism!
by: Dhrumil Mehta
A small group of lawyers and its outsized influence on the U.S. Supreme Court
which is common to Journalism and Social Science.
Which is partly why methodological innovation is so hard to do in newsrooms, and often the new and innovative methodologies are limited to large newsrooms with time and space to experiment or with the power to put resources behind large enterprise projects.
... and even then we're often leaning on the work of our colleagues.
What has it been used for in Social Science?
What is it not good for?
What are the pitfalls?
What are the pitfalls specifically to journalistic inquiry?
How do we edit a story that uses this methodology?
How do we communicate the methodology readers?
How do we communicate the pitfalls to readers?
"Research shows that minority candidates can be successful in drawing out co-ethnic minority voters....
...but it is difficult to draw any conclusions from the research about national elections, in which partisanship is a much stronger force."
There is only one poll I know of that breaks out a "South Asian American" cross-tab...
...and it comes out once every few years.
During Jindal’s first gubernatorial campaign, in 2003, South Asian-Americans donated an estimated $667,000, or 19 percent of the $3.5 million he raised from individual donations.
During his much more expensive 2007 and 2011 campaigns, that figure dropped by about half and made up only about 4 percent of the approximately $8 million that he raked in from individual donors during both of those campaigns.
2017
2019
What has it been used for in Social Science?
What is it not good for?
What are the pitfalls?
What are the pitfalls specifically to journalistic inquiry?
How do we edit a story that uses this methodology?
How do we communicate the methodology readers?
How do we communicate the pitfalls to readers?
Applications And Advances In Text Analysis And Machine Learning
Political Communication: Methods
That's one of the things I'd like to do if I were at Columbia...
Work across disciplines to identify methods of empirical inquiry that could be useful to journalists. Work with students on interesting stories using those methods and then make those methods accessible to newsrooms.
And while it's important for students to participate in innovation...
...it is equally important for them to be trained in the fundamentals of computational journalism.
Database Journalist, Politics
https://www.datajournalismawards.org/project-listing/?project_id=2082
I edit POLL-BOT
... which is actually just 538's politics intern
(more about POLL-BOT later)
Bots let humans do what they're good at
Bot reports the facts, leaving time for humans to interpret them.
But the bot also helps interpret facts!
Lets readers see results that FiveThirtyEight deems unexpected
Expectations are calibrated before results ever start coming in.
Bot evolves into a human...
...jk
C+J Conference @ Stanford (2016)
Lessons from Policy School
Democracy and Technology Fellow (Ash Center for Democratic Governance and Innvation)
DPI-691M | Programming and data for Policymakers
The Government is just a series of CRUD* applications that interact with each other.
*CRUD = Create Read Update Delete
- David Zvenyach (former Executive Director of 18F)
Avoiding Disaster
Cutting through red-tape and building stuff that works...
Bringing best-practices from software development into government.
Understanding enough about technology to not be fooled by people selling bad technology or charging too much for too little...
... or perpetuating bad ideas about technology like "security through obscurity"
Not deferring technical decisions to "technical people"
Unlocking Quant Skills
Data in the Classroom
Data in Journalism
Text
New research from the University of Washington finds that a natural aptitude for learning languages is a stronger predictor of learning to program than basic math knowledge, or numeracy.
- University of Washington News
March 2, 2020
DPI-691M: Programming and Data for Policymakers
(aka. #code4policy)
Data and code are no longer just for programmers. Policymakers in the 21st century, from members of congress to analysts and executives need to be equipped with the necessary skills to navigate nuanced issues at the intersection of technology and governance.
Those who have first hand experience with programming, data, software development and management, methods, open source collaboration, and technology innovation are better prepared to competently navigate these issues.
(#code4policy)
Text
"Dhrumil did a fantastic job of not only lining up great speakers for the class, but achieving gender parity among his guests - an impressive feat in the male-dominated tech sphere!"
---
This class gave me more confidence to pursue possible jobs in being a bridge between technology teams and social service delivery. It also sparked my interest in mastering some skills in the course we didn't get to practice but seem valuable, like scraping for data or utilizing and API.
---
I gained confidence that I am able to get to grips with new technologies/ software on my own and tools for doing so in a systematic way.
This was a great class and I really enjoyed taking it! I had to work surprisingly hard, but felt like that learning was purposeful. I feel proud of my final project and am grateful to have had the opportunity to work on it.
---
dhrumil.mehta@fivethirtyeight.com
@datadhrumil
@dmil