Karl Ho
School of Economic, Political and Policy Sciences
University of Texas at Dallas
Presentation prepared for UT Dallas Computer Science Spring Break Online Conference
(Series of Tech-Talks and Tutorials), March 24, 2020
US CDC: https://www.cdc.gov/coronavirus/2019-ncov/index.htmlNational Health Commission of the People's Republic of China (NHC): http://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml
China CDC (CCDC): http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm
Hong Kong Department of Health: https://www.chp.gov.hk/en/features/102465.html
Macau Government: https://www.ssm.gov.mo/portal/
Taiwan CDC: https://sites.google.com/cdc.gov.tw/2019ncov/taiwan?authuser=0
Government of Canada: https://www.canada.ca/en/public-health/services/diseases/coronavirus.html
Australia Government Department of Health: https://www.health.gov.au/news/coronavirus-update-at-a-glance
European Centre for Disease Prevention and Control (ECDC): https://www.ecdc.europa.eu/en/geographical-distribution-2019-ncov-cases
Ministry of Health Singapore (MOH): https://www.moh.gov.sg/covid-19
Italy Ministry of Health: http://www.salute.gov.it/nuovocoronavirus
Worldometer https://www.worldometers.info/coronavirus/
The COVID Tracking Project https://covidtracking.com/
1Point3Acres https://coronavirus.1point3acres.com/#stat
Underreporting
Useable
Most cited/used: Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE)
https://github.com/CSSEGISandData/COVID-19
Advantages | Disadvantages | |
---|---|---|
REST API | API regulated | |
GOT | More time and data control (12 variables | Needs Python customizations |
Scraper | Older data | Limited N; needs Python customizations |
rtweet | more variables (92) | <7 days of data |
The non-API methods can collect Twitter data by keyword or username search. The former approach is good for studying public responses on a policy or an issue such as Corona Virus, etc.
The second approach allows studying social network, influence and behaviors of a certain user or group (e.g. Trump).
January 23 to March 24, 2020
January 23 to March 24, 2020
July 27, 2018-September 8, 2018
Focus on the network structure
Build samples over time
Collect data by influential networks and nodes
State-space models
Twitter data are time series data
Twitter data are social network data
Community identification
Future developments:
Network-driven sampling vs. Respondent-driven sampling (Heckathorn 1997, 2009)
Exponential random graph models (ERGM)
Markov chain models
Writing on the wall:
Within a week or so, we will be the most badly hit country even compared to China and Italy
Caveat:
Data are logged but trajectories are comparable!
average=0.79 s.d.=0.13 skewness= -1.12
average=0.54 s.d.=0.15 skewness=0.72
July 27, 2018-September 8, 2018