Tutorial: Up and down the PyData Stats Stack
Or Lies damned lies and statistics
Peadar Coyle
I work as a Senior Data Scientist at Channel 4
We're a media company, and we leverage data for targeted advertising, customer analytics and recommendation engines
We're hiring, so have a chat with me if interested.
Who am I?
What else do I do?
- Open Source Contributor to PyMC3
- Mathematics and Physics background
- Fellow of the Royal Statistical Society and member of NumFOCUS
- Author of interviews with data scientists book :)
Stats is everywhere...
What was wrong with this?
More or Less debunked this - the sample was wrong!
Outcomes of this tutorial
- Have three tools to attack the same problem
- Some tricks and tips like hypothesis testing and feature selection
- Understand how to interpret and debug three versions of Logistic Regression. ML, Frequentist Stats and Bayesian
Different schools of data
Firstly hypothesis testing
-
Or how to do t-tests and all that stuff
Three schools
- ScikitLearn
- Statsmodels
- PyMC3
Bayesian tooling
PyMC3
https://pymc-devs.github.io/pymc3/Bayesian_LogReg/
How likely am I to make more than $50K?
Statsmodels
Frequentist Logistic Regression
stats_in_python
By springcoil
stats_in_python
- 2,531