Neural Ordinary Differential Equations

Presented by Alex Feng

Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud

Outline

Related Work/Background
Model Structure
Experiments
Continuous Normalizing Flows
A Generative Latent Function Time-Series Model
Conclusions

Background

Recurrent Networks
LSTM/GRU Blocks
Residual Blocks

Recurrent Neural Networks

Source: colah's blog

Vanishing Gradient Problem

Gradients decay exponentially with number of layers
Becomes impossible to learn correlations between temporally distant events

LSTMs

Source: colah's blog

GRUs

Source: colah's blog

Residual Networks

Source: towards data science

Transformations on a Hidden State

$$\pmb h_{t+1}=\pmb h_t+f(\pmb h_t,\theta_t)$$

Euler's Method

$$f(x+h)=f(x)+hf'(x)$$

Ordinary Differential Equations

Describe the derivative of some function
Can be solved with a "black box" solver
$$\pmb h_{t+1}=\pmb h_t+f(\pmb h_t,\theta_t)\rightarrow\frac{d\pmb{h}(t)}{dt}=f(\pmb{h}(t),t,\theta)$$

ODE Solvers

Runge-Kutta
Adams
Implicit vs. Explicit
Black Box

Updating the Network

RK Backpropagation
Adams Method
Adjoint Method

The Adjoint Method

$L$(ODESolve($\pmb{z}(t_0),f,t_0,t_1,\theta))$

$$\pmb{a}(t)=\frac{\partial L}{\partial \pmb{z}(t)}$$

$$\frac{d\pmb{a}(t)}{dt}=-\pmb{a}(t)^T\frac{\partial f(\pmb{z}(t),t,\theta)}{\partial\pmb{z}}$$

Results in Supervised Learning

Error Control

Normalizing Flows

Continuous Normalizing Flows

Multiple Hidden Units

\frac{dz}{dt}=\Sigma_n \sigma_n(t) f_n(\pmb z)

Experiments in CNFs

Maximum Likelihood Training

A Generative Latent Function Time-series Model

Applying neural networks to irregularly sampled data is difficult
Typically observations are put into bins of fixed duration, but this leads to missing data issues

Each time-series is represented by a latent trajectory
Each of these is determined by a local initial state $\pmb z_0$ and a global set of dynamics
Given a set of times, an ODE solver produces a set of latent states at each observation

Training the Model

Poisson Process Likelihoods

Experiments

Limitations

Minibatching more complicated
Requires Lipschitz nonlinearities such as tanh or relu
User must choose error tolerances

Conclusions

Adaptive evaluation
Tradeoff between speed and accuracy
Applications in time-series, supervised learning, and continuous normalizing flows

Future Work

Regularizing ODE nets to be faster to solve
Getting the time-series model to scale up and extend it to stochastic differential equations.
CNFs as a practical generative density model

Closing Remarks

Some figures were hard to understand at first
Lacks comparison to state-of-the-art
Overall mainly a proof-of-concept paper

Neural Ordinary Differential Equations

Outline

Background

Recurrent Neural Networks

Vanishing Gradient Problem

LSTMs

GRUs

Residual Networks

Transformations on a Hidden State

$$\pmb h_{t+1}=\pmb h_t+f(\pmb h_t,\theta_t)$$

Euler's Method

$$f(x+h)=f(x)+hf'(x)$$

Ordinary Differential Equations

ODE Solvers

Updating the Network

The Adjoint Method

Results in Supervised Learning

Error Control

Normalizing Flows

Continuous Normalizing Flows

Multiple Hidden Units

Experiments in CNFs

Maximum Likelihood Training

A Generative Latent Function Time-series Model

Training the Model

Poisson Process Likelihoods

Experiments

Limitations

Conclusions

Future Work

Closing Remarks

Questions?