Probabilistic Programming


A Brief introduction to Probabilistic Programming

and Python

PyCon Ireland 2015

peadarcoyle@googlemail.com

All opinions my own


Who am I?


Data Scientist/blogger based in Luxembourg





Aim of the talk :) 





Make Bayesian Statistics/ Probabilistic Programming 


Not so scary

What is Probabilistic Programming



  • Basically using random variables instead of variables
  • Allows you to create a generative story rather than a black box
  • A different tool to Machine Learning
  • A different paradigm to frequentist statistics
  • Forces you to be explicit about your 'subjective' assumptions




Bayesian Statistics


  • I studied Mathematics, and encountered in textbooks Bayesians
  • This is a hard area to do by pen and paper, and most integrals can't be solved in exact form
  • Thankfully there was an invention of Monte Carlo Simulations
  • These simulations are used to approximate your likelihood function


Some terminology

(Bayes rule)


Attribution: Quantopian blog


In the real world?


Aside: How do you pick your prior?




  • This is a bit of an art
  • You generally base the prior on experience 
  • As you add more data this matters less and less





Huh but isn't Probabilistic Programming just Stan and BUGS?







No in Python you have PyMC3




  • A complete rewrite of PyMC2 now in 'Beta' status
  • Based upon Theano 
  •  Computational techniques for handling gradients
  • Automatic Differentiation and GPU speedup



What else?



  • Theano - is also used in deep learning!
  • Currently there is a project to port 'BMH' from PyMC2 to PyMC3
  • I gave a thorough tutorial on this - my github
  • Key authors: John Salvatier, Thomas Wiecki, Chris Fonnesbeck 
  • Case study: Rugby Analytics


    I wanted to do a model of the Six Nations last year.

    I wanted to build an understandable model to predict the winner

    Key Info: Inferring the 'strength' of each team.

    We only have scoring data, which is noisy hence Bayesian Stats 





    Hierarchical Model


    What did I do?


    1. I picked Gamma as a prior for all teams

    2. I used a Hierarchical Model because I didn't want the effect of home advantage to be distributed independent of the strength of a team

    3. From this I was able to create a novel model based only on historical results and scoring intensity 

    4. I simulated the likelihood function using MCMC







    Run the model




    What actually happened

    • The model incorrectly predicted that England would come out on top.
    • Ireland actually won by points difference of 6 points. 
    • It really came down to the wire!
    • "Prediction is difficult especially about the future"
    • One of the problems is what we call 'over-shrinkage' and you can delve into the results to see what the errors are, my model was within the errors. 
    • Hat tip: Thanks to Abraham Flaxman and the PyMC3 on helping me port this from PyMC2 to PyMC3

    Lessons learned


    • I can build an explainable model using PyMC2 and PyMC3

    • Generative stories help you build up interest with your colleagues

    • Communication is the 'last mile' problem of Data Science

    • PyMC3 is cool please use it and please contribute




    Wanna learn more?


    BMH


    Jake VanDerPlas

    PyMC3


    peadarcoyle@googlemail.com



    Made with Slides.com