Using COVID-19 data
with EdelweissData
Overview
- Case study 1 : Dashboard for Slovenia
- Case study 2: Estimating cases from deaths
- What is Edelweiss Data?
- Using COVID-19 data from ED with Jupyter
- How to do nightly data imports with Github Actions
- Using the power of metadata
Dashboad for Slovenia
Dashboad for Slovenia
- Grassroots effort to collect and communicate data
- CSV files on Github worked fine initially
- But country comparision started to strain browser resources
Dashboad for Slovenia
Estimating case numbers from deaths
Estimating case numbers from deaths
- Limited testing capacity means case numbers are unreliable
- Deaths are harder to miss
- Given the IFR (~ 1 in 140), estimate true cases from deaths
10 total deaths
Estimating case numbers from deaths
10 total deaths
1400 infected
23 days earlier
- Limited testing capacity means case numbers are unreliable
- Deaths are harder to miss
- Given the IFR (~ 1 in 140), estimate true cases from deaths
Estimating case numbers from deaths
What is EdelweissData?
A platform for managing tabular datasets
- with rich metadata support
- strong versioning
- interactive web UIs
- APIs that work with all sorts of tools (Python, R, Excel, KNIME, ...)
4 parts of every dataset
date | location | cases |
---|---|---|
Metadata
JSON
📄
README
Schema
date: datetime
location: string
cases: int
How did ED help with the Slovenia Dashboard?
- Easy importing of data
- ED provides JSON APIs with powerful filtering
- Automatic updates every night + full version history
Â
How did ED help with the case estimation notebook?
- Metadata made it possible to easily switch between worldwide and country level datasets
- Metadata describes columns, enumerates regions, provides links to extended documentation
Â
Consume data with your favourite tool
How to unify data sources
location | date | new_cases | total_cases | new_deaths | total_deaths |
---|---|---|---|---|---|
state | date | cases | deaths |
---|---|---|---|
Bundesland | Meldedatum | AnzahlFall | AnzahlTodesfall |
---|---|---|---|
{ "columnNames": {
"region": "location",
"date": "date",
"total-cases": "total_cases",
"new-cases": "new_cases",
"total-deaths": "total_deaths",
"new-deaths": "new_deaths"
}
}
{ "columnNames": {
"region": "state",
"date": "date",
"total-cases": "cases",
"total-deaths": "deaths",
}
}
{ "columnNames": {
"region": "Bundesland",
"date": "Meldedatum",
"total-cases": "AnzahlFall",
"total-deaths": "AnzahlTodesfall",
}
}
Our world in data Dataset
metadata.json
New York Times Dataset
metadata.json
RKI Dataset
metadata.json
Periodic data imports
One neat way: Github Actions
# This workflow runs as a cron job to download the current version of the New York Times
# covid 19 dataset for the US and publishes a new version of this dataset into edelweiss
# data
name: Update New York Times dataset
on:
schedule:
- cron: '15 15 * * *'
jobs:
test:
- name: run update
working-directory: data-import-scripts
env:
REFRESH_TOKEN: ${{ secrets.REFRESH_TOKEN }}
run: python new-york-times.py
Public beta soon ...
Thank you!
These slides:
slides.com/danielbachler/covid19-edelweiss-data
Using COVID-19 data with EdelweissData
By Daniel Bachler
Using COVID-19 data with EdelweissData
- 1,122