Francois Lanusse @EiffL
From this excellent tutorial:
Observed y
Unkown x
Ground-based telescope
Hubble Space Telescope
Let's try to understand the neural network output by looking at the loss function
$$ \mathcal{L} = \sum_{(x_i, y_i) \in \mathcal{D}} \parallel x_i - f_\theta(y_i)\parallel^2 \quad \simeq \quad \int \parallel x - f_\theta(y) \parallel^2 \ p(x,y) \ dx dy $$ $$\Longrightarrow \int \left[ \int \parallel x - f_\theta(y) \parallel^2 \ p(x|y) \ dx \right] p(y) dy $$
This is minimized when $$f_{\theta^\star}(y) = \int x \ p(x|y) \ dx $$
i.e. when the network is predicting the mean of p(x|y).
credit: Venkatesh Tata
=> This means expressing the posterior as a Bernoulli distribution with parameter predicted by a neural network
Step 1: We neeed some data
cat or dog image
label 1 for cat, 0 for dog
Probability of including cats and dogs in my dataset
Implicit prior
Image search results for cats and dogs
Implicit likelihood
A distance between distributions: the Kullback-Leibler Divergence
Step 2: We need a tool to compare distributions
Minimizing this KL divergence is equivalent to minimizing the negative log likelihood of the model
In our case of binary classification:
We recover the binary cross entropy loss function !
There are intrinsic uncertainties in this problem, at each x there is a full
I have a set of data points {x, y} where I observe x and want to predict y.
We need a parametric conditional distribution to
compute
import tensorflow as tf
import tensorflow_probability as tfp
tfd = tfp.distributions
# Build model.
model = tf.keras.Sequential([
tf.keras.layers.Dense(1+1),
tfp.layers.IndependentNormal(1),
])
# Define the loss function:
negloglik = lambda x, q: - q.log_prob(x)
# Do inference.
model.compile(optimizer='adam', loss=negloglik)
model.fit(x, y, epochs=500)
# Make predictions.
yhat = model(x_tst)
This is our data
Build a regression model for y gvien x
import tensorflow.keras as keras
import tensorflow_probability as tfp
# Number of components in the Gaussian Mixture
num_components = 16
# Shape of the distribution
event_shape = [1]
# Utility function to compute how many parameters this distribution requires
params_size = tfp.layers.MixtureNormal.params_size(num_components, event_shape)
gmm_model = keras.Sequential([
keras.layers.Dense(units=128, activation='relu', input_shape=(1,)),
keras.layers.Dense(units=128, activation='tanh'),
keras.layers.Dense(params_size),
tfp.layers.MixtureNormal(num_components, event_shape)
])
negloglik = lambda y, q: -q.log_prob(y)
gmm_model.compile(loss=negloglik, optimizer='adam')
gmm_model.fit(x_train.reshape((-1,1)), y_train.reshape((-1,1)),
batch_size=256, epochs=20)
Try it out at this notebook
We want to make dynamical mass measurements using information from member galaxy velocity dispersion and about the radial distance distribution (see Ho et al. 2019).
regression_model = keras.Sequential([
keras.layers.Dense(units=128, activation='relu', input_shape=(14,)),
keras.layers.Dense(units=128, activation='relu'),
keras.layers.Dense(units=64, activation='tanh'),
keras.layers.Dense(units=1)
])
regression_model.compile(loss='mean_squared_error', optimizer='adam')
num_components = 16
event_shape = [1]
model = keras.Sequential([
keras.layers.Dense(units=128, activation='relu', input_shape=(14,)),
keras.layers.Dense(units=128, activation='relu'),
keras.layers.Dense(units=64, activation='tanh'),
keras.layers.Dense(tfp.layers.MixtureNormal.params_size(num_components, event_shape)),
tfp.layers.MixtureNormal(num_components, event_shape)
])
negloglik = lambda y, p_y: -p_y.log_prob(y)
model.compile(loss=negloglik, optimizer='adam')
Distribution of masses in our training data
We can reweight the predictions for a desired prior
From this excellent tutorial:
Given a training set D = {X,Y}, the predictions from a Neural Network can be expressed as:
Weight Estimation by Maximum Likelihood
Weight Estimation by Variational Inference
TensorFlow Probability implementation
Quick reminder on dropout
Hinton 2012, Srivastava 2014