Pymc-Learn: Practical Probabilistic Machine Learning in Python

Daniel Emaasit

Data Scientist @ Haystax

TomTom Applied ML Conference, 2019

April 11, 2019

There is a growing need

for

principled machine learning

by

non-ML specialists

This has led to

 

increased adoption of

 

probabilistic modeling

  • A Model:
  • A model describes data that one could observe from a system                     (Ghahramani, 2014)
  • Use the mathematics of probability theory to express all forms of uncertainty

Generative Process

Inference

ML: A Probabilistic Approach

Reasons for increased adoption of Probabilistic Modeling (1/5)

  • The need for transparent models

Reasons for increased adoption of Probabilistic Modeling (2/5)

  • The need for models with calibrated quantities of uncertainty.
  • The ever-increasing number of promising results achieved in A.I.

Reasons for increased adoption of Probabilistic Modeling (3/5)

  • Emergence of probabilistic programming languages (PPLs).

Probability

Stan

Pyro

Reasons for increased adoption of Probabilistic Modeling (4/5)

  • Increased media attention.

Reasons for increased adoption of Probabilistic Modeling (5/5)

Case Study

Time Stamp Energy Consumed (kW)
10/9/2018  12:47:00 PM 120
10/10/2018  12:47:00 PM 100
10/11/2018  12:47:00 PM 105
10/12/2018  12:47:00 PM 100
10/14/2018  12:47:00 PM 119
10/15/2018  12:47:00 PM 110
10/16/2018  12:47:00 PM 105
10/17/2018  12:47:00 PM 100
10/18/2018  12:47:00 PM 118
10/19/2018  12:47:00 PM 104

My Problem:

 

Are there any unusual patterns in Energy Consumption by my plant that might cause safety incidents?

.

.

.

.

.

.

from sklearn.gaussian_process import GaussianProcessRegressor()

model = GaussianProcessRegressor()

model.fit(X_train, y_train)

model.predict(X_test, y_test)

model.score(X_test, y_test)

model.save('path/to/saved/model')

Few lines of code

  • Build + Train + Predict + Score + Save + Load

Possible Solution (Traditional Approach):

 

  • Import a Gaussian process model from Scikit-learn

import pymc3 as pm

# Instantiate a model
with pm.Model() as latent_gp_model:
    
    # specify the priors
    length_scale = pm.Gamma("length_scale", alpha = 2, beta = 1)
    signal_variance = pm.HalfCauchy("signal_variance", beta = 5)
    noise_variance = pm.HalfCauchy("noise_variance", beta = 5)
    degrees_of_freedom = pm.Gamma("degrees_of_freedom", alpha = 2, beta = 0.1)
    
    # specify the kernel function
    cov = signal_variance**2 * pm.gp.cov.ExpQuad(1, length_scale)
        
    # specify the mean function
    mean_function = pm.gp.mean.Zero()
    
    # specify the gp
    gp = pm.gp.Latent(cov_func = cov)
    
    # specify the prior over the latent function
    f = gp.prior("f", X = X) 
    
    # specify the likelihood
    obs = pm.StudentT("obs", mu = f, lam = 1/signal_variance, nu = degrees_of_freedom, observed = y)

# Perform Inference
with latent_gp_model:
    posterior = pm.sample(draws = 100, njobs = 2)
# extend the model by adding the GP conditional distribution so as to predict at test data
with latent_gp_model:
    f_pred = gp.conditional("f_pred", X_new)

# sample from the GP conditional posterior
with latent_gp_model:
    posterior_pred = pm.sample_ppc(posterior, vars = [f_pred], samples = 200)

Build a model

Train a model

Prediction

Possible Solution (Probabilistic Approach):

 

  • Build a Gaussian Process model with PyMC3

Pymc-learn

  • Inspired by scikit-learn. Focus is on non-ML specialists

Mimics Scikit-learn

from pmlearn.gaussian_process import GaussianProcessRegressor()

# Instantiate a PyMC3 Gaussian process model
model = GaussianProcessRegressor()

# Fit using MCMC or Variational Inference
model.fit(X_train, y_train)

model.predict(X_test, y_test)

model.score(X_test, y_test)

model.save('path/to/saved/model')

Mimics Scikit-learn

Possible Solution (Probabilistic Approach):

 

  • Import a Gaussian process model from Pymc-learn

Pymc-learn

Thank You!

Made with Slides.com