Connecting data with people: http://www.dataninja.it
An European investigation about migrants died from 2000 trying to reach the Fortress Europe
CC-BY 4.0 International
dataninja.it / About us
- Dataninja is a data journalism network founded in 2012
Our aim is to tell stories from data analysis, the so-called data-driven story-telling
- We deal with journalism and teaching together with the main Italian editorial groups and the more advanced public administrations on Open Data topics.
The context: informations on the web
From the dawn of civilization until 2003, humankind generated five exabytes of data. Now we produce five exabytes every two days… and the pace is accelerating!
Google’s executive chairman | Fonte: http://goo.gl/pgle3G
Why data visualization?
As it had never happened in our lives, today we have access to tons of free information and for free. With the right tools we can begin to make sense of these data to see patterns and trends that otherwise would be invisible to us. Transforming numbers in graphic form allows readers to learn about the stories that those numbers hiding.
Alberto Cairo, "The functional Art" | Source: http://goo.gl/KHm3hT
The investigation of l'Espresso
Publication day: simultaneously on 9 newspapers in six European countries in March 31, 2014
From 2000 to 2013 more than 23 thousands of migrants have died trying to reach Europe
Eight routes of entry
It was possible to estimate a mortality rate of each route: calculation shows that the most critical one starts from the Maghreb towards Sicily
The are no official monitoring for victims (neither by Italy, nor by European Union)
- Two EU agencies monitor frontiers (Frontex, Eurosur)
There are datasets based on articles and posts from media, observatories and blogs built by NGOs (Fortress Europe, United for Intercultural Action, PULS dell'università di Helsinki)
- There are studies and analysis regarding specific territories (Sicilia, Canarie) or time intervals (a single year, a season)
Dataninja scraped data from the Fortress Europe website.
united for intercultural action / puls
Journalism++ scraped and collected data from United for Intercultural Action e PULS websites
scraping / merging / deduplication
- These are all activities necessary to extract data from their original rigid containers (web pages and PDFs), to structure them as tables and so to make them machine readable
- Original different structures are adapted to have a single useful one (date / location / description / source / ... )
- We have finally aggregated all data in a single table and cleaned it deleting double events thanks to a semi automatic deduplication process
It contains 2700+ events (during which 23000+ migrants died from 2000 to 2013) and it is browsable
Single events (ie. shipwrecks) are put in relations by extracting entities (ie. events' typologies, locations, victims' types) | Link: http://goo.gl/xy0h0Q
The investigation was performed by 10 people among journalists, data scientists, developers and more, thanks to an on-line shared platform.
Roles and methodology
- A project manager coordinated the whole process and checked progress
Journalists searched for data and contextual informations and fact checked them, also with a sample testing strategy (fact-checking)
- Dataninja tried to provide an error estimation of the data manipulation process (about 10%)
Methodology on l'espresso
Below the article published on L'Espresso there is also a methodological note with contacts | Link: http://goo.gl/9gpvlA
fact-checking at the university
Some students in digital journalism at the University of Bologna helped us to check single events (thanks to prof. Carlo Gubitosa)
March 31, 2014: the publication day
The investigation was published simultaneously on 9 newspapers from 6 different European countries, creating a widespread distribution in each country
During 24h, 2000+ tweets in 11 different languages are sent
Awards: Brussels and Barcelona
The investigation was presented at the DataHarvest+ Conference in Brussels and won the Data Journalism Awards in Barcelona (June 2014)
By Dataninja srls