Probabilistic Programming
A Brief introduction to Probabilistic Programming
and Python
PyCon Ireland 2015
peadarcoyle@googlemail.com
All opinions my own
Who am I?
Data Scientist/blogger based in Luxembourg
Aim of the talk :)
Make Bayesian Statistics/ Probabilistic Programming
Not so scary
What is Probabilistic Programming
 Basically using random variables instead of variables
 Allows you to create a generative story rather than a black box
 A different tool to Machine Learning
 A different paradigm to frequentist statistics
 Forces you to be explicit about your 'subjective' assumptions
Bayesian Statistics
 I studied Mathematics, and encountered in textbooks Bayesians
 This is a hard area to do by pen and paper, and most integrals can't be solved in exact form
 Thankfully there was an invention of Monte Carlo Simulations
 These simulations are used to approximate your likelihood function
Some terminology
(Bayes rule)
Aside: How do you pick your prior?
 This is a bit of an art
 You generally base the prior on experience
 As you add more data this matters less and less
Huh but isn't Probabilistic Programming just Stan and BUGS?
No in Python you have PyMC3
 A complete rewrite of PyMC2 now in 'Beta' status
 Based upon Theano
 Computational techniques for handling gradients
 Automatic Differentiation and GPU speedup
What else?
Theano  is also used in deep learning!
Currently there is a project to port 'BMH' from PyMC2 to PyMC3
I gave a thorough tutorial on this  my github
Key authors: John Salvatier, Thomas Wiecki, Chris Fonnesbeck
Case study: Rugby Analytics
I wanted to do a model of the Six Nations last year.
I wanted to build an understandable model to predict the winner
Key Info: Inferring the 'strength' of each team.
We only have scoring data, which is noisy hence Bayesian Stats
Hierarchical Model
What did I do?
1. I picked Gamma as a prior for all teams
2. I used a Hierarchical Model because I didn't want the effect of home advantage to be distributed independent of the strength of a team
3. From this I was able to create a novel model based only on historical results and scoring intensity
4. I simulated the likelihood function using MCMC
Run the model
What actually happened
 The model incorrectly predicted that England would come out on top.
 Ireland actually won by points difference of 6 points.
 It really came down to the wire!
 "Prediction is difficult especially about the future"
 One of the problems is what we call 'overshrinkage' and you can delve into the results to see what the errors are, my model was within the errors.
 Hat tip: Thanks to Abraham Flaxman and the PyMC3 on helping me port this from PyMC2 to PyMC3
Lessons learned
 I can build an explainable model using PyMC2 and PyMC3

Generative stories help you build up interest with your colleagues
 Communication is the 'last mile' problem of Data Science
 PyMC3 is cool please use it and please contribute
Wanna learn more?
peadarcoyle@googlemail.com