- Gather & use environmental data (5 yrs)
- Disseminate water data @ UN-FAO (7 yrs)
- Structure & use varied Int dev data @ ACDI/VOCA (2 yrs)
Haters gonna hate!
Points to be covered
Inconsistent (in 3 ways)
"Why don't the db and the report match!?"
http://bit.ly/2tNngZl accessed 2017.07.04
One is live, the other is a written analysis!
Meanwhile, from our secret analyses...
1) Observe constantly
2) Add transparency
Also db traffic doubled!!
"Why is your data inconsistent from country to country!?"
- Countries do things differently
- Data comes in all shapes and sizes
To Upload or Not to Upload?
Role-Play: Upload or Not
- Freshwater Withdrawal for City A is 10 km3/yr, City B is 15 km3/yr and City C is 5 km3/yr (3 biggest cities).
- Total Freshwater Withdrawal for 2015 is 30 km3/yr.
- Total Water Withdrawal for 2017 is 27.3 km3/yr.
You are the analyst entering national-level Freshwater Withdrawal data for 2017. Do you upload these entries yes or no?
No right answers!
HAVE TO upload something, NOTHING is ever perfect
yeah, ok but:
Solution 2: Disclaimers!
Structured Contextual Metadata!
Also available in csv files through top download buttons (tidy format :) )
(Visibility, stickiness, category)
- Dummy variables
- Inherit metadata to residuals
"Why do different agencies have different data for the same country!?"
... but we do work together by standardizing and harmonizing!
Propose a project to:
combine all data (infrastructure)
Show-off joint data in a pretty portal
- Focus on infrastructure
- Use Shiny for quick prototyping
"Why so little data!?"
"It's Jan 1, so where is ALL the data?"
VERY sparce data
We can create data
Example: Dam Evaporation
- Combining AQUASTAT data in R with:
- Scraped Wikipedia
- Open Street Maps (OSM) API
- Biggest (dirty) dataset on global dams
But bread-and-butter data is from questionnaires
Incoming Data Quality @AQUASTAT
Data Deprivation at World Bank
How can R help with sparcity?
but it can help
in many other ways...
"How to generate interest ($$$$)"
Push data further!
Huge amount of boring work
(emails, permission, munging, quality control,
revisions, emails, ...)
Quick-ish fun work
(reporting, modelling &| viz)
Squeeze more juice out of those lemons!
story vs data viz
Start from research
Analysis & viz
Document wins & lessons learned
"Crunching data into aggregates is too time-consuming"
Mismatch in skillset
R automation to the rescue!
safeSource <- safely(source) a <- ProjectDF %>% filter(completed==T) %>% pull(filePathFileName) %>% map(safeSource) %>% transpose()
What does automation buy?
- Command-center run status: Emails you on error!
- Senior Mgmt
- Subject Expert Staff
- Project Staff
- Mega Shiny Dashboard!
- Staff can actually look at the data!
"The ultimate reason why int dev data is useless..."
WHAT HAVE YOU DONE FOR INT DEV?
Thank you! + Summary
If you are an int dev worker
Analyze data usage to allocate resources efficiently and resolve user bottlenecks!
Disseminate structured contextual metadata!
Defend back-end and prototype front-end in Shiny!
Get more juice out of your data lemons!
If you are a int dev manager
Hire a data specialist
Field-validate theoretical approaches
If you are a data scientist
Don't be so mean!
Use symbols & metadata!
By Amit Kohli
EARL Presentation: Making International Development Data Not-Useless