The Data Journalism Taxonomy
Sarah Cohen
February 2015 / Columbia University
Statistics are people with the tears washed off- Paul Brodeur
He uses statistics as a drunken man uses lamp-posts -- for support rather than illumination- origin unknown, widely attributed to Andrew Lang, b. 1844
"An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem."- famous statistician John Tukey
Data in service of story
- Classic CAR: Harvesting Cash / Rural development
- Recent from the NYT CAR desk:
- On the income cliff in the new health care law
- On guns traded in the wild West of the Internet
- Recent Pulitzers:
- Sun-Sentinel and cops
- Sarasota Herald-Tribune on weak insurance regulations for homeowners.
-
Bristol, Va. Herald Courier on the state's failure to return money to landowners. (Dan Gilbert's story of the lengths they went to in order to get the training he needed.)
Timing: Part of the reporting process.
Qualities: Original research never before attempted or achieved, usually using public records
Skills / schools
- Reporting to find and understand sources (Journalism)
- News judgment (journalism)
- Data / document acquisition and cleaning (anywhere: coding expected)
- Qualitative and quantitative analysis (social sciences and statistics; more recently data science; law)
- Exploratory visualizations (data science, statistics, design)
- Reporting out the data and anecdotes (journalism)
- Storifying and writing about the data / documents (journalism)
Journalism about data
- Guardian Data Blog (the granddaddy)
- Fivethirtyeight.com. (Fox or hedgehog? You decide.)
- The Upshot at the New York Times
- Vox
Timing: Anytime data becomes available; usually not original reporting but based on others' original research
Qualities: Insightful comment and analysis of others' work. Engaging and informal writing. Little or no programming or original reporting.
Skills / schools
- Reporting to find the data * (journalism)
- News judgment (journalism)
- Visualization for exploration and publication ( coding or tool use; design)
- Writing fast (journalism)
- Reporting to explain the data (journalism, also sometimes to storify)
- Domain knowledge (social sciences, medical school, e.g., a science writer)
* In many instances, not original researchData as journalism
(News Applications and Interactive Graphics)
- Pick your ProPublica project, this one on internships
- New York Times' famed interactives, including this brilliant one from Amanda Cox
- Campaign finance, elections and Olympics apps
Timing: After a story is reported and proven, used primarily in presentation and distribution.
Qualities: Presentation-quality design ; optimized for mobile and other platforms; capable of high traffic with performance.
Skills / schools
- In-depth programming ability (computer science or a dev shop)
- Scale and stability (computer science)
- News judgment (journalism)
- Acquiring large-scale, streaming data (?)
- Design (journalism, design, geography, art, HCI, computer science)
Research in journalism
- Applied statistics in news
- Document analysis
- Natural language processing
- Machine learning for data and documents
Taxonomy of Data Journalism
By Sarah Cohen
Taxonomy of Data Journalism
- 1,829