Diving into data
Skopje, 26.10.2015 | Jacopo Ottaviani Twitter @JacopoOttaviani
A practical workshop on data wrangling
for journalism and activism
Objectives
- Introduce techniques to explore data in depth
- Show some techniques
- Explore two tools in real-life scenarios:
- Spreadsheets ( Google Drive )
- OpenRefine ( openrefine.org )
It all starts with RESEARCH QUESTIONS
Filtering data
- Select certain categories of data
- Filters can be composed:
- more filters can be applied together
- useful to focus in the data
Pivot tables
- Useful to have a bird's eye view on the dataset
- Useful to to approach the data from multiple angles
- Let you break down and re-aggregate :
- sum data
- sort data
- calculate averages
- run operations on selected fields
You can do this with Excel or Spreadsheet
http://bit.ly/skopje-datasets
Questions?
Exercise #1 - Diving into data
- Choose a dataset between:
- China Aid Data
- The Migrants Files
- From here http://bit.ly/skopje-datasets
- Make a copy of it on your Google Drive:
- To do this: File > Make a copy
- Work out a list of research questions
- Try to find answers from the data:
- Use filters to select categories of data
- Run pivot tables to explore the data
Crunching data with OpenRefine
- OpenRefine (www.openrefine.org)
- Made to clean and refine data
- Allows operations such as:
- data conversion
- filtering
- splitting
- clustering
- string elaborations
From chaos to order
Clustering data
Questions?
Exercise #2 - OpenRefine
- Install OpenRefine
- Import the Iraq Body Count dataset
- Explore the dataset using the text facets
- e.g. count weapons?
- Set the right data types
- Cluster columns
- Clean, enrich and play with the data
- Download the new version of the dataset in Excel format
Thank you!
Find me on Twitter @JacopoOttaviani
Exercise #1
By jottaviani
Exercise #1
- 544