Intro

Experience:

  • Gather & use environmental data (5 yrs)
  • Disseminate water data @ UN-FAO (7 yrs)
  • Structure & use varied Int dev data @ ACDI/VOCA (2 yrs)

Haters gonna hate!

Points to be covered

  • Inconsistent (in 3 ways)

  • Incomplete

  • Other R-wins!

"Why don't the db and the report match!?"

Internally Inconsistent

Type A

http://www.fao.org/nr/water/aquastat/countries_regions/IND/index.stm

http://bit.ly/2tNngZl accessed 2017.07.04

One is live, the other is a written analysis!

Meanwhile, from our secret analyses...

SUBJECT

EXPERTS

COUNTRY

EXPERTS

TARGETED QUERIES

http://www.fao.org/nr/water/aquastat/countries_regions/IND/index.stm

http://www.fao.org/nr/water/aquastat/data/query/results.html?regionQuery=false&showCodes=true&yearRange.fromYear=1960&yearRange.toYear=2015&varGrpIds=4150,4151,...,4456,4471,4472,4509&cntIds=100&newestOnly=true...

Result?

1) Observe constantly

2) Add transparency

and convenience

Also db traffic doubled!!

New clusters

Country Experts

"Why is your data  inconsistent from country to country!?"

Internally Inconsistent

Type B

  • Countries do things differently
  • Data comes in all shapes and sizes

To Upload or Not to Upload?

?

Role-Play: Upload or Not

  • Freshwater Withdrawal for City A is 10 km3/yr, City B is 15 km3/yr and City C is 5 km3/yr (3 biggest cities).
  • Total Freshwater Withdrawal for 2015 is 30 km3/yr.
  • Total Water Withdrawal for 2017 is 27.3 km3/yr.

You are the analyst entering national-level Freshwater Withdrawal data for 2017. Do you upload these entries yes or no?

No right answers!

HAVE TO upload something, NOTHING is ever perfect

Anti-Solution 1:

MORE VARIABLES!

yeah, ok but:

SPARSITY!

COMPLICATED!

Slippery Slope?

Solution 2: Disclaimers!

Symbols!

Structured Contextual Metadata!

 

Also available in csv files through top download buttons (tidy format :) )

(Visibility, stickiness, category)

HEED

THE

SYMBOLS

AND

METADATA

!!!!!!!!!

Viz

Modelling

  • Dummy variables
  • Inherit metadata to residuals

"Why do different agencies have different data for the same country!?"

Externally Inconsistent

Separate worlds

Different ministries,

different mandates,

different definitions

... but we do work together by standardizing and harmonizing!

Let's harmonize!

Propose a project to:

combine all data (infrastructure) 

+

Show-off joint data in a pretty portal

UNDERFUNDED!

Reduce

scope?

 Focus on

portal?

Next time:

  • Focus on infrastructure
  • Use Shiny for quick prototyping

"Why so little data!?"

"It's Jan 1, so where is ALL the data?"

VERY sparce data

We can create data

Example: Dam Evaporation

  • Combining AQUASTAT data in R with:
    • Scraped Wikipedia
    • Open Street Maps (OSM) API
  • Biggest (dirty) dataset on global dams

But bread-and-butter data is from questionnaires

Incoming Data Quality @AQUASTAT

Data Deprivation at World Bank

http://blogs.worldbank.org/opendata/much-world-deprived-poverty-data-let-s-fix

How can R help with sparcity?

can't

:(

but it can help

in many other ways...

"How to generate interest ($$$$)"

limited funding

Push data further!

 

 

 

 

Huge amount of boring work

(emails, permission, munging, quality control,

revisions, emails, ...)

 

 

 

Quick-ish fun work

(reporting, modelling &| viz)

more stuff!

more stuff!

more stuff!

more stuff!

Squeeze more juice out of those lemons!

Reproducible

Objective

Analysis

story vs data viz

Start from research

solid

architecture

Validate approach

@ field-level

Analysis & viz

Document wins & lessons learned

Implement

& scale

"Crunching data into aggregates is too time-consuming"

Mismatch in skillset

R automation to the rescue!

safeSource <- safely(source)

a <- ProjectDF %>%  filter(completed==T) %>%
  pull(filePathFileName) %>% 
  map(safeSource) %>% transpose()

What does automation buy?

  • Command-center run status: Emails you on error!
  • Overviews:
    • Senior Mgmt
    • Subject Expert Staff
    • Project Staff
  • Mega Shiny Dashboard!
  • Staff can actually look at the data!

"The ultimate reason why int dev data is useless..."

???

WHAT HAVE YOU DONE FOR INT DEV?

#dataForGood

Thank you! + Summary

Amit Kohli

@vizmonkey

If you are an int dev worker

  • Analyze data usage to allocate resources efficiently and resolve user bottlenecks!

  • Disseminate structured contextual metadata!

  • Defend back-end and prototype front-end in Shiny!

  • Get more juice out of your data lemons!

If you are a int dev manager

  • Hire a data specialist

  • Field-validate theoretical approaches

If you are a data scientist

  • Don't be so mean!

  • Use symbols & metadata!

  • Get involved!

EARL

By Amit Kohli

EARL

EARL Presentation: Making International Development Data Not-Useless

  • 1,957