Tutorial: Up and down the PyData Stats Stack

 

Or Lies damned lies and statistics

Peadar Coyle

I work as a Senior Data Scientist at Channel 4

We're a media company, and we leverage data for targeted advertising, customer analytics and recommendation engines

We're hiring, so have a chat with me if interested.

Who am I?

What else do I do?

  • Open Source Contributor to PyMC3
  • Mathematics and Physics background
  • Fellow of the Royal Statistical Society and member of NumFOCUS
  • Author of interviews with data scientists book :) 

Stats is everywhere...

What was wrong with this?

More or Less debunked this - the sample was wrong!

Outcomes of this tutorial

  • Have three tools to attack the same problem
  • Some tricks and tips like hypothesis testing and feature selection
  • Understand how to interpret and debug three versions of Logistic Regression. ML, Frequentist Stats and Bayesian


http://bit.ly/pydataldnstats2

Different schools of data 

HT: Vincent D. Warmerdam

Firstly hypothesis testing

  • Or how to do t-tests and all that stuff

Three schools

 

  • ScikitLearn 
  • Statsmodels
  • PyMC3

 

Bayesian tooling

PyMC3

https://pymc-devs.github.io/pymc3/Bayesian_LogReg/ 

How likely am I to make more than $50K?

Statsmodels

Frequentist Logistic Regression

stats_in_python

By springcoil

stats_in_python

  • 2,531