End-to-End Differentiable Simulations of Particle Accelerators

Artificial Neural Networks

Example: "Fully connected, feed-forward network"

Input Layer

"Hidden" Layers

Output Layer

\vec{x}_n = \sigma\left(\bm{W}_n\cdot \vec{x}_{n-1} + \vec{b}_n\right)

\vec{x}_n = \sigma\left(\bm{W}_n\cdot \vec{x}_{n-1} + \vec{b}_n\right)

Transformation per layer

Bias term

Weight matrix

(Non-)linear function ("activation")

\bm{W}_1

\bm{W}_1

\vec{x}_0

\vec{x}_0

\vec{x}_1

\vec{x}_1

\vec{x}_2

\vec{x}_2

\vec{x}_3

\vec{x}_3

\vec{x}_4

\vec{x}_4

\bm{W}_2

\bm{W}_2

\bm{W}_3

\bm{W}_3

\bm{W}_4

\bm{W}_4

Input data

Prediction

Used for mapping complex, non-explicit relationships
E.g. image classification, translation, text / image / video generation, ...
Data-driven algorithms

Optimization of network parameters

"Supervised Learning" using samples of input and expected output

Sample 🡒 Forward pass 🡒 Prediction

Quality of prediction by comparing with (expected) sample output, e.g. MSE

Minimization of loss by tuning network parameters 🡒 Optimization problem

Gradient based parameter updates using (analytically) exact gradients via Backpropagation algorithm

Dataflow &
Differentiable
Programming

Example: Particle Tracking

Example: Linear optics

x

x

p_x

p_x

y

y

p_y

p_y

t

t

p_t

p_t

x

x

p_x

p_x

y

y

p_y

p_y

t

t

p_t

p_t

M

M

Transfer Matrix

Tracking through one element is analogue to a neural network layer (b = 0, σ = id)
Weights are given and constrained by the transfer matrix
Whole beamline is analogue to a feed-forward network

@ Entrance

@ Exit

Example: Particle Tracking

Example: Quadrupole

M = \begin{pmatrix} \cos(\omega L) & \omega^{-1}\sin(\omega L) & 0 & 0 & 0 & 0 \\ \\ -\omega\sin(\omega L) & \cos(\omega L) & 0 & 0 & 0 & 0 \\ \\ 0 & 0 & \cosh(\omega L) & \omega^{-1}\sinh(\omega L) & 0 & 0 \\ \\ 0 & 0 & \omega\sinh(\omega L) & \cosh(\omega L) & 0 & 0 \\ \\ 0 & 0 & 0 & 0 & 1 & \frac{L}{\beta_0^2\gamma_0^2} \\ \\ 0 & 0 & 0 & 0 & 0 & 1 \end{pmatrix}

M = \begin{pmatrix} \cos(\omega L) & \omega^{-1}\sin(\omega L) & 0 & 0 & 0 & 0 \\ \\ -\omega\sin(\omega L) & \cos(\omega L) & 0 & 0 & 0 & 0 \\ \\ 0 & 0 & \cosh(\omega L) & \omega^{-1}\sinh(\omega L) & 0 & 0 \\ \\ 0 & 0 & \omega\sinh(\omega L) & \cosh(\omega L) & 0 & 0 \\ \\ 0 & 0 & 0 & 0 & 1 & \frac{L}{\beta_0^2\gamma_0^2} \\ \\ 0 & 0 & 0 & 0 & 0 & 1 \end{pmatrix}

\omega = \sqrt{k_1} \; , \; k_1 = g / (B\rho)

\omega = \sqrt{k_1} \; , \; k_1 = g / (B\rho)

x_1 = \cos(\sqrt{k}L)x_0 + \frac{\sin(\sqrt{k}L)}{\sqrt{k}}p_0

x_1 = \cos(\sqrt{k}L)x_0 + \frac{\sin(\sqrt{k}L)}{\sqrt{k}}p_0

Tracking as computation graph

x_1 = \cos(\sqrt{k}L)x_0 + \frac{\sin(\sqrt{k}L)}{\sqrt{k}}p_0

x_1 = \cos(\sqrt{k}L)x_0 + \frac{\sin(\sqrt{k}L)}{\sqrt{k}}p_0

Express as combination / chain of elementary operations

\textcolor{green}{x_1} = \textcolor{red}{\cos(}\color{red}\sqrt{\color{black}\textcolor{black}{k}}\textcolor{red}{\cdot} \textcolor{black}{L}\textcolor{red}{)}\textcolor{red}{\cdot} \textcolor{blue}{x_0} \textcolor{red}{+} \textcolor{red}{\sin(}\color{red}\sqrt{\color{black}\textcolor{black}{k}}\textcolor{red}{\cdot} \textcolor{black}{L}\textcolor{red}{)}\textcolor{red}{\cdot}\left(\color{red}\sqrt{\color{black}\textcolor{black}{k}}\right)^{\textcolor{red}{-1}}\textcolor{red}{\cdot}\textcolor{blue}{p_0}

\textcolor{green}{x_1} = \textcolor{red}{\cos(}\color{red}\sqrt{\color{black}\textcolor{black}{k}}\textcolor{red}{\cdot} \textcolor{black}{L}\textcolor{red}{)}\textcolor{red}{\cdot} \textcolor{blue}{x_0} \textcolor{red}{+} \textcolor{red}{\sin(}\color{red}\sqrt{\color{black}\textcolor{black}{k}}\textcolor{red}{\cdot} \textcolor{black}{L}\textcolor{red}{)}\textcolor{red}{\cdot}\left(\color{red}\sqrt{\color{black}\textcolor{black}{k}}\right)^{\textcolor{red}{-1}}\textcolor{red}{\cdot}\textcolor{blue}{p_0}

Output / prediction

Input

(Elementary) operation

Parameter of element

Layout as computation graph (keeping track of operations and values at each stage)

Tracking as computation graph

k

k

\sqrt{}

\sqrt{}

\cdot^{-1}

\cdot^{-1}

\ast

\ast

\ast

\ast

L

L

\ast

\ast

\sin

\sin

\cos

\cos

\ast

\ast

+

+

p_0

p_0

x_0

x_0

x_1

x_1

\cos(\sqrt{\textcolor{orange}{k}}\cdot L)\cdot x_0 + \sin(\sqrt{\textcolor{orange}{k}}\cdot L)\cdot\left(\sqrt{\textcolor{orange}{k}}\right)^{-1}\cdot p_0

\cos(\sqrt{\textcolor{orange}{k}}\cdot L)\cdot x_0 + \sin(\sqrt{\textcolor{orange}{k}}\cdot L)\cdot\left(\sqrt{\textcolor{orange}{k}}\right)^{-1}\cdot p_0

\cos(\textcolor{orange}{\sqrt{k}}\cdot L)\cdot x_0 + \sin(\textcolor{orange}{\sqrt{k}}\cdot L)\cdot\left(\textcolor{orange}{\sqrt{k}}\right)^{-1}\cdot p_0

\cos(\textcolor{orange}{\sqrt{k}}\cdot L)\cdot x_0 + \sin(\textcolor{orange}{\sqrt{k}}\cdot L)\cdot\left(\textcolor{orange}{\sqrt{k}}\right)^{-1}\cdot p_0

\cos(\sqrt{k}\textcolor{orange}{\cdot L})\cdot x_0 + \sin(\sqrt{k}\textcolor{orange}{\cdot L})\cdot\left(\sqrt{k}\right)^{-1}\cdot p_0

\cos(\sqrt{k}\textcolor{orange}{\cdot L})\cdot x_0 + \sin(\sqrt{k}\textcolor{orange}{\cdot L})\cdot\left(\sqrt{k}\right)^{-1}\cdot p_0

\textcolor{orange}{\cos}(\sqrt{k}\cdot L)\cdot x_0 + \textcolor{orange}{\sin}(\sqrt{k}\cdot L)\cdot\left(\sqrt{k}\right)^{-1}\cdot p_0

\textcolor{orange}{\cos}(\sqrt{k}\cdot L)\cdot x_0 + \textcolor{orange}{\sin}(\sqrt{k}\cdot L)\cdot\left(\sqrt{k}\right)^{-1}\cdot p_0

\cos(\sqrt{k}\cdot L)\cdot x_0 + \sin(\sqrt{k}\cdot L)\cdot\color{orange}\left(\color{black}\sqrt{k}\color{orange}\right)^{\textcolor{orange}{-1}}\color{black}\cdot p_0

\cos(\sqrt{k}\cdot L)\cdot x_0 + \sin(\sqrt{k}\cdot L)\cdot\color{orange}\left(\color{black}\sqrt{k}\color{orange}\right)^{\textcolor{orange}{-1}}\color{black}\cdot p_0

\cos(\sqrt{k}\cdot L)\textcolor{orange}{\cdot}\textcolor{blue}{x_0} + \sin(\sqrt{k}\cdot L)\cdot\left(\sqrt{k}\right)^{-1}\textcolor{orange}{\cdot}\textcolor{blue}{p_0}

\cos(\sqrt{k}\cdot L)\textcolor{orange}{\cdot}\textcolor{blue}{x_0} + \sin(\sqrt{k}\cdot L)\cdot\left(\sqrt{k}\right)^{-1}\textcolor{orange}{\cdot}\textcolor{blue}{p_0}

\cos(\sqrt{k}\cdot L)\cdot x_0 + \sin(\sqrt{k}\cdot L)\textcolor{orange}{\cdot}\left(\sqrt{k}\right)^{-1}\cdot p_0

\cos(\sqrt{k}\cdot L)\cdot x_0 + \sin(\sqrt{k}\cdot L)\textcolor{orange}{\cdot}\left(\sqrt{k}\right)^{-1}\cdot p_0

\cos(\sqrt{k}\cdot L)\cdot \textcolor{blue}{x_0} \textcolor{orange}{+} \sin(\sqrt{k}\cdot L)\cdot\left(\sqrt{k}\right)^{-1}\cdot \textcolor{blue}{p_0}

\cos(\sqrt{k}\cdot L)\cdot \textcolor{blue}{x_0} \textcolor{orange}{+} \sin(\sqrt{k}\cdot L)\cdot\left(\sqrt{k}\right)^{-1}\cdot \textcolor{blue}{p_0}

Tracking as computation graph

\cos(\sqrt{k}\cdot L)\cdot x_0 + \sin(\sqrt{k}\cdot L)\cdot\left(\sqrt{k}\right)^{-1}\cdot p_0

\cos(\sqrt{k}\cdot L)\cdot x_0 + \sin(\sqrt{k}\cdot L)\cdot\left(\sqrt{k}\right)^{-1}\cdot p_0

k

k

\sqrt{}

\sqrt{}

\cdot^{-1}

\cdot^{-1}

\ast

\ast

\ast

\ast

L

L

\ast

\ast

\sin

\sin

\cos

\cos

\ast

\ast

+

+

p_0

p_0

x_0

x_0

x_1

x_1

0.1

0.32

3.2

0.98

1.0

0.32

0.95

0.31

0.0295

0.0475

0.077

0.03

0.05

Tracking as computation graph

\cos(\sqrt{k}\cdot L)\cdot x_0 + \sin(\sqrt{k}\cdot L)\cdot\left(\sqrt{k}\right)^{-1}\cdot p_0

\cos(\sqrt{k}\cdot L)\cdot x_0 + \sin(\sqrt{k}\cdot L)\cdot\left(\sqrt{k}\right)^{-1}\cdot p_0

k

k

\sqrt{}

\sqrt{}

\cdot^{-1}

\cdot^{-1}

\ast

\ast

\ast

\ast

L

L

\ast

\ast

\sin

\sin

\cos

\cos

\ast

\ast

+

+

p_0

p_0

x_0

x_0

x_1

x_1

0.1

0.32

3.2

0.98

1.0

0.32

0.95

0.31

0.0295

0.0475

0.077

1.0

1

1

1.0

\textrm{r.h.s.}

\textrm{r.h.s.}

\textrm{r.h.s.}

\textrm{r.h.s.}

0.03

0.05

0.03

\textrm{r.h.s.}

\textrm{r.h.s.}

0.096

0.009

-\sin

-\sin

\cos

\cos

-0.016

0.091

\frac{d}{dk}x_1 \rightarrow \textrm{\small{chain rule}}

\frac{d}{dk}x_1 \rightarrow \textrm{\small{chain rule}}

\textrm{r.h.s.}

\textrm{r.h.s.}

0.075

-\left(\cdot\right)^{-2}

-\left(\cdot\right)^{-2}

-0.091

\frac{1}{2\sqrt{}}

\frac{1}{2\sqrt{}}

-0.025

Why not derive manually?

For long lattices this type of derivatives can become very quickly very tedious

Beamline consists of many elements

Tracking in circular accelerator requires multiple turns

Dependence on lattice parameters typically non-linear

Symbolic differentiation can help to "shortcut"

Example: Quadrupole tuning

Beamline, 160 m, 21 quadrupoles, gradients are to be adjusted
Fulfill multiple goals:
- Minimize losses along beamline
- Minimize beam spot size at target
- Keep beam spot size below threshold at beam dump location

21 Quadrupoles

Target

Beam dump

Envelope

Optimize gradients

Example: Inference of model errors

Same beamline as before, infer quadrupole gradient errors w.r.t model
Quality metric: mean squared error of simulated vs. measured Trajectory Response Matrix (TRM)
TRM = orbit response @ BPMs in dependence on kicker strengths

\mathrm{TRM}_{\,mn}^{\,q = x,y} = \frac{q_{m,2} - q_{m,1}}{\theta_{n,2} - \theta_{n,1}}

\mathrm{TRM}_{\,mn}^{\,q = x,y} = \frac{q_{m,2} - q_{m,1}}{\theta_{n,2} - \theta_{n,1}}

beam position @ BPM

kicker strength

What about other optimizers?

Gradient-Free

Gradient-Approx.

Gradient-Exact

E.g. Nelder-Mead, Evolutionary algorithms, Particle Swarm, etc.

Any derivative-based method, e.g. via finite-difference approx.

Balance exploration vs. exploitation of the loss landscape (parameter space)

Exploration: Try out yet unknown locations

Exploitation: Use information about landscape to perform an optimal parameter update

No "built-in" exploration, but can be included via varying starting points (global) and learning rate schedules (local)

Symbolic differentiation or differentiable programming

Additional advantages

Good convergence properties, also for (very) large parameter spaces
No need to rely on plain vanilla gradient descent, all sort of advanced optimizers are available, e.g. RMSprop or Adam
Two function evaluations per gradient computation (where e.g. finite-difference approx. uses N function evaluations)
Running on GPU / TPU? Yes!
Deep learning packages provide straightforward ways to switch devices
→ "Batteries included"

https://commons.wikimedia.org/wiki/File:Geforce_fx5200gpu.jpg

Not limited to particle tracking nor linear optics

High versatility w.r.t. data and metrics

Icons by icons8

End-to-End Differentiable Simulations of Particle Accelerators

End-to-End Differentiable Simulations of Particle Accelerators

More from Dominik Vilsmeier