Mercè Crosas, IQSS, Harvard University
@mercecrosas
Imagine that you have data for all the deaths of all Medicare beneficiaries in the US 2000-2012 (~half a million person-years) and want to model the effect of air pollution levels on death, controlling for other factors that also affect death (such as smoking, BMI).
Concludes that levels of PM2.5 below the current standard are still harmful
source: https://www.linkedin.com/pulse/text-mining-its-applications-industry-subhajit-mukherjee
Consilience: a tool that enables you to quickly read, understand, categorize, and derive insights from large quantities of unstructured text.
Computer Science
Software Engineering
Statistics
Machine Learning
Domain Expertise
Ista Zhan's Data Science Tools Workshop IQSS: https://rawgit.com/IQSS/workshops/master/DataScienceTools/DataScienceTools.html
Serviceable
Serviceable
Robert Muenchen: http://r4stats.com/articles/popularity
Top: SQL, Python, Java, Hadoop, R
Top: SPSS, R, SAS, Stata
Python
R
Popularity of SPSS, Stata, SAS is decreasing compared to Python, R, and Julia.
R Packages developed at Harvard's Institute for Quantitative Social Science (IQSS)
zeligproject.org
Everyones's Statistical Software
Gary King, Michael Tomz, and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science, 44, Pp. 341–355. Copy at http://j.mp/2n65duA
"Convert the raw results of any specific statistical procedure into expressions that:
1) convey numerically precise measurements of the quantities of greatest substantive interest, 2) include reasonable measurements of uncertainties about those estimates,
3) require little specialized knowledge to understand"
Christopher Gandrud, from IQSS Software Best Practices workshop
Thanks
@mercecrosas
Harvard's Institute for Quantitative Social Science
@IQSS
iq.harvard.edu