Diving into data

Skopje, 26.10.2015 | Jacopo Ottaviani Twitter @JacopoOttaviani

A practical workshop on data wrangling

for journalism and activism

Objectives

  • Introduce techniques to explore data in depth
  • Show some techniques
  • Explore two tools in real-life scenarios: 
    • Spreadsheets ( Google Drive )
    • OpenRefine ( openrefine.org )

It all starts with RESEARCH QUESTIONS

Filtering data

  • Select certain categories of data
  • Filters can be composed:
    • more filters can be applied together
    • useful to focus in the data

Pivot tables

  • Useful to have a bird's eye view on the dataset
  • Useful to to approach the data from multiple angles
  • Let you break down and re-aggregate :
    • sum data
    • sort data
    • calculate averages
    • run operations on selected fields 

You can do this with Excel or Spreadsheet

http://bit.ly/skopje-datasets

Questions?

Exercise #1 - Diving into data

  • Choose a dataset between:
    • China Aid Data
    • The Migrants Files
    • From here  http://bit.ly/skopje-datasets 
  • Make a copy of it on your Google Drive:
    • To do this: File > Make a copy
  • Work out a list of research questions
  • Try to find answers from the data:
    • Use filters to select categories of data
    • Run pivot tables to explore the data

Crunching data with OpenRefine

  • OpenRefine (www.openrefine.org)
  • Made to clean and refine data
  • Allows operations such as:
    • data conversion 
    • filtering
    • splitting
    • clustering
    • string elaborations

From chaos to order

Clustering data

Questions?

Exercise #2 - OpenRefine

  • Install OpenRefine
  • Import the Iraq Body Count dataset
  • Explore the dataset using the text facets 
    • e.g. count weapons?
  • Set the right data types
  • Cluster columns
  • Clean, enrich and play with the data
  • Download the new version of the dataset in Excel format

Thank you!

Find me on Twitter @JacopoOttaviani

Exercise #1

By jottaviani

Exercise #1

  • 544