Karl Ho
Data Generation datageneration.io
Karl Ho
School of Economic, Political and Policy Sciences
University of Texas at Dallas
Presentation prepared for UT Dallas Computer Science Spring Break Online Conference
(Series of Tech-Talks and Tutorials), March 24, 2020
US CDC: https://www.cdc.gov/coronavirus/2019-ncov/index.htmlNational Health Commission of the People's Republic of China (NHC): http://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml
China CDC (CCDC): http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm
Hong Kong Department of Health: https://www.chp.gov.hk/en/features/102465.html
Macau Government: https://www.ssm.gov.mo/portal/
Taiwan CDC: https://sites.google.com/cdc.gov.tw/2019ncov/taiwan?authuser=0
Government of Canada: https://www.canada.ca/en/public-health/services/diseases/coronavirus.html
Australia Government Department of Health: https://www.health.gov.au/news/coronavirus-update-at-a-glance
European Centre for Disease Prevention and Control (ECDC): https://www.ecdc.europa.eu/en/geographical-distribution-2019-ncov-cases
Ministry of Health Singapore (MOH): https://www.moh.gov.sg/covid-19
Italy Ministry of Health: http://www.salute.gov.it/nuovocoronavirus
Worldometer https://www.worldometers.info/coronavirus/
The COVID Tracking Project https://covidtracking.com/
1Point3Acres https://coronavirus.1point3acres.com/#stat
Underreporting
Useable
Most cited/used: Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE)
https://github.com/CSSEGISandData/COVID-19
Advantages | Disadvantages | |
---|---|---|
REST API | API regulated | |
GOT | More time and data control (12 variables | Needs Python customizations |
Scraper | Older data | Limited N; needs Python customizations |
rtweet | more variables (92) | <7 days of data |
The non-API methods can collect Twitter data by keyword or username search. The former approach is good for studying public responses on a policy or an issue such as Corona Virus, etc.
The second approach allows studying social network, influence and behaviors of a certain user or group (e.g. Trump).
January 23 to March 24, 2020
January 23 to March 24, 2020
July 27, 2018-September 8, 2018
Focus on the network structure
Build samples over time
Collect data by influential networks and nodes
State-space models
Twitter data are time series data
Twitter data are social network data
Community identification
Future developments:
Network-driven sampling vs. Respondent-driven sampling (Heckathorn 1997, 2009)
Exponential random graph models (ERGM)
Markov chain models
Writing on the wall:
Within a week or so, we will be the most badly hit country even compared to China and Italy
Caveat:
Data are logged but trajectories are comparable!
average=0.79 s.d.=0.13 skewness= -1.12
average=0.54 s.d.=0.15 skewness=0.72
July 27, 2018-September 8, 2018
By Karl Ho
Presentation prepared for UT Dallas Computer Science Spring Break Online Conference (Series of Tech-Talks and Tutorials), March 24, 2020