ML Club, Thursday November 7th 2019
Francois Lanusse @EiffL
Follow the slides live at
https://slides.com/eiffl/ml_club/live
From this excellent tutorial:
There are intrinsic uncertainties in this problem, at each x there is a full
I have a set of data points {x, y} where I observe x and want to predict y.
Try it out with this notebook
Step 1: Conditional Neural Density Estimators
We need a parametric conditional distribution to
compute
A distance between distributions: the Kullback-Leibler Divergence
Step 2: We need a tool to compare distributions
Minimizing this KL divergence is equivalent to minimizing the negative log likelihood of the model
import tensorflow as tf
import tensorflow_probability as tfp
tfd = tfp.distributions
# Build model.
model = tf.keras.Sequential([
tf.keras.layers.Dense(1+1),
tfp.layers.IndependentNormal(1),
])
# Define the loss function:
negloglik = lambda x, q: - q.log_prob(x)
# Do inference.
model.compile(optimizer='adam', loss=negloglik)
model.fit(x, y, epochs=500)
# Make predictions.
yhat = model(x_tst)
Try it out at this notebook
We want to make dynamical mass measurements using information from member galaxy velocity dispersion and about the radial distance distribution (see Ho et al. 2019).
regression_model = keras.Sequential([
keras.layers.Dense(units=128, activation='relu', input_shape=(14,)),
keras.layers.Dense(units=128, activation='relu'),
keras.layers.Dense(units=64, activation='tanh'),
keras.layers.Dense(units=1)
])
regression_model.compile(loss='mean_squared_error', optimizer='adam')
num_components = 16
event_shape = [1]
model = keras.Sequential([
keras.layers.Dense(units=128, activation='relu', input_shape=(14,)),
keras.layers.Dense(units=128, activation='relu'),
keras.layers.Dense(units=64, activation='tanh'),
keras.layers.Dense(tfp.layers.MixtureNormal.params_size(num_components, event_shape)),
tfp.layers.MixtureNormal(num_components, event_shape)
])
negloglik = lambda y, p_y: -p_y.log_prob(y)
model.compile(loss=negloglik, optimizer='adam')
credit: Venkatesh Tata
=> This means expressing the posterior as a Bernoulli distribution with parameter predicted by a neural network
Step 1: We neeed some data
cat or dog image
label 1 for cat, 0 for dog
Probability of including cats and dogs in my dataset
Google Image search results for cats and dogs
Minimizing this KL divergence is equivalent to minimizing the negative log likelihood of the model
At minimum negative log likelihood, up to a prior term, the model recovers the Bayesian posterior
with
In our case of binary classification:
We recover the binary cross entropy loss function !
Distribution of masses in our training data
We can reweight the predictions for a desired prior
From this excellent tutorial:
Given a training set D = {X,Y}, the predictions from a Neural Network can be expressed as:
Weight Estimation by Maximum Likelihood
Weight Estimation by Variational Inference
TensorFlow Probability implementation
Quick reminder on dropout
Hinton 2012, Srivastava 2014