Intro
Experience:
- Gather & use environmental data (5 yrs)
- Disseminate water data @ UN-FAO (7 yrs)
- Structure & use varied Int dev data @ ACDI/VOCA (2 yrs)
Haters gonna hate!
Points to be covered
-
Inconsistent (in 3 ways)
-
Incomplete
-
Other R-wins!
"Why don't the db and the report match!?"
Internally Inconsistent
Type A
http://www.fao.org/nr/water/aquastat/countries_regions/IND/index.stm
http://bit.ly/2tNngZl accessed 2017.07.04
One is live, the other is a written analysis!
Meanwhile, from our secret analyses...
SUBJECT
EXPERTS
COUNTRY
EXPERTS
TARGETED QUERIES
http://www.fao.org/nr/water/aquastat/countries_regions/IND/index.stm
http://www.fao.org/nr/water/aquastat/data/query/results.html?regionQuery=false&showCodes=true&yearRange.fromYear=1960&yearRange.toYear=2015&varGrpIds=4150,4151,...,4456,4471,4472,4509&cntIds=100&newestOnly=true...
Result?
1) Observe constantly
2) Add transparency
and convenience
Also db traffic doubled!!
New clusters
Country Experts
"Why is your data inconsistent from country to country!?"
Internally Inconsistent
Type B
- Countries do things differently
- Data comes in all shapes and sizes
To Upload or Not to Upload?
?
Role-Play: Upload or Not
- Freshwater Withdrawal for City A is 10 km3/yr, City B is 15 km3/yr and City C is 5 km3/yr (3 biggest cities).
- Total Freshwater Withdrawal for 2015 is 30 km3/yr.
- Total Water Withdrawal for 2017 is 27.3 km3/yr.
You are the analyst entering national-level Freshwater Withdrawal data for 2017. Do you upload these entries yes or no?
No right answers!
HAVE TO upload something, NOTHING is ever perfect
Anti-Solution 1:
MORE VARIABLES!
yeah, ok but:
SPARSITY!
COMPLICATED!
Slippery Slope?
Solution 2: Disclaimers!
Symbols!
Structured Contextual Metadata!
Also available in csv files through top download buttons (tidy format :) )
(Visibility, stickiness, category)
HEED
THE
SYMBOLS
AND
METADATA
!!!!!!!!!
Viz
Modelling
- Dummy variables
- Inherit metadata to residuals
"Why do different agencies have different data for the same country!?"
Externally Inconsistent
Separate worlds
Different ministries,
different mandates,
different definitions
... but we do work together by standardizing and harmonizing!
Let's harmonize!
Propose a project to:
combine all data (infrastructure)
+
Show-off joint data in a pretty portal
UNDERFUNDED!
Reduce
scope?
Focus on
portal?
Next time:
- Focus on infrastructure
- Use Shiny for quick prototyping
"Why so little data!?"
&
"It's Jan 1, so where is ALL the data?"
VERY sparce data
We can create data
Example: Dam Evaporation
- Combining AQUASTAT data in R with:
- Scraped Wikipedia
- Open Street Maps (OSM) API
- Biggest (dirty) dataset on global dams
But bread-and-butter data is from questionnaires
Incoming Data Quality @AQUASTAT
Data Deprivation at World Bank
http://blogs.worldbank.org/opendata/much-world-deprived-poverty-data-let-s-fix
How can R help with sparcity?
can't
:(
but it can help
in many other ways...
"How to generate interest ($$$$)"
limited funding
Push data further!
Huge amount of boring work
(emails, permission, munging, quality control,
revisions, emails, ...)
Quick-ish fun work
(reporting, modelling &| viz)
more stuff!
more stuff!
more stuff!
more stuff!
Squeeze more juice out of those lemons!
Reproducible
Objective
Analysis
story vs data viz
Start from research
solid
architecture
Validate approach
@ field-level
Analysis & viz
Document wins & lessons learned
Implement
& scale
"Crunching data into aggregates is too time-consuming"
Mismatch in skillset
R automation to the rescue!
safeSource <- safely(source)
a <- ProjectDF %>% filter(completed==T) %>%
pull(filePathFileName) %>%
map(safeSource) %>% transpose()
What does automation buy?
- Command-center run status: Emails you on error!
- Overviews:
- Senior Mgmt
- Subject Expert Staff
- Project Staff
- Mega Shiny Dashboard!
- Staff can actually look at the data!
"The ultimate reason why int dev data is useless..."
???
WHAT HAVE YOU DONE FOR INT DEV?
#dataForGood
Thank you! + Summary
Amit Kohli
If you are an int dev worker
-
Analyze data usage to allocate resources efficiently and resolve user bottlenecks!
-
Disseminate structured contextual metadata!
-
Defend back-end and prototype front-end in Shiny!
-
Get more juice out of your data lemons!
If you are a int dev manager
-
Hire a data specialist
-
Field-validate theoretical approaches
If you are a data scientist
-
Don't be so mean!
-
Use symbols & metadata!
-
Get involved!
EARL
By Amit Kohli
EARL
EARL Presentation: Making International Development Data Not-Useless
- 2,125