Pymc-Learn: Practical Probabilistic Machine Learning in Python

Daniel Emaasit

Data Scientist @ Haystax

PyData Washington DC, 2018

November 17, 2018

There is a growing need

for

principled machine learning

by

non-ML specialists

Reasons for increased adoption of Probabilistic Modeling (1/3)

  • the need for transparent models with calibrated quantities of uncertainty.
  • the ever-increasing number of promising results achieved in A.I.

Reasons for increased adoption of Probabilistic Modeling (2/3)

  • the emergency of probabilistic programming languages (PPLs).

Reasons for increased adoption of Probabilistic Modeling (3/3)

Gaussian Process in PyMC3

import pymc3 as pm

# Instantiate a model
with pm.Model() as latent_gp_model:
    
    # specify the priors
    length_scale = pm.Gamma("length_scale", alpha = 2, beta = 1)
    signal_variance = pm.HalfCauchy("signal_variance", beta = 5)
    noise_variance = pm.HalfCauchy("noise_variance", beta = 5)
    degrees_of_freedom = pm.Gamma("degrees_of_freedom", alpha = 2, beta = 0.1)
    
    # specify the kernel function
    cov = signal_variance**2 * pm.gp.cov.ExpQuad(1, length_scale)
        
    # specify the mean function
    mean_function = pm.gp.mean.Zero()
    
    # specify the gp
    gp = pm.gp.Latent(cov_func = cov)
    
    # specify the prior over the latent function
    f = gp.prior("f", X = X) 
    
    # specify the likelihood
    obs = pm.StudentT("obs", mu = f, lam = 1/signal_variance, nu = degrees_of_freedom, observed = y)

# Perform Inference
with latent_gp_model:
    posterior = pm.sample(draws = 100, njobs = 2)
# extend the model by adding the GP conditional distribution so as to predict at test data
with latent_gp_model:
    f_pred = gp.conditional("f_pred", X_new)

# sample from the GP conditional posterior
with latent_gp_model:
    posterior_pred = pm.sample_ppc(posterior, vars = [f_pred], samples = 200)

Build a model

Train a model

Prediction

Scikit-learn

from sklearn.gaussian_process import GaussianProcessRegressor()

model = GaussianProcessRegressor()

model.fit(X_train, y_train)

model.predict(X_test, y_test)

model.score(X_test, y_test)

model.save('path/to/saved/model')

Few lines of code

  • Build + Train + Predict + Score + Save + Load

Pymc-learn

  • Inspired by scikit-learn. Focus is on non-ML specialists

Pymc-learn

from pmlearn.gaussian_process import GaussianProcessRegressor()

# Instantiate a PyMC3 Gaussian process model
model = GaussianProcessRegressor()

# Fit using MCMC or Variational Inference
model.fit(X_train, y_train)

model.predict(X_test, y_test)

model.score(X_test, y_test)

model.save('path/to/saved/model')

Mimics Scikit-Learn

Thank You!

Made with Slides.com