deck

What is Automatic Differentiation?

$x:X$

$y :Y$

$f$

Transform program $f$ to a program $\partial \, f$ that computes derivative of $f$

$x:X$

$dy :Y$

$\partial \, f$

$dx:X$

$x:X$

$dx :X$

$\partial^\dagger \, f$

$dy:Y$

$\nabla f = \partial^\dagger \, f \,\,1$ for $f : X \rightarrow \mathbb{R}$

$\partial^\dagger \, f \, x = (\partial\, f \,x)^\dagger$

Why Scientific Computing in Lean?

Scientific computing is full of mathematics

Lean understands mathematics

+

The goal: To have an interactive computer algebra system operating directly on the source code.

symbolic computation - no jumping between C++ and Mathematica
source code transformation - e.g. automatic differentiation
optimization - use mathematical argument to produce faster code (e.g. Halide)
specification to implementation
you do not have to be compiler engineer to do these

Lean is general purpose programming language with powerful metaprogramming

(similar motivation as "Why We Created Julia")

+

PROFIT

?

Personal Motivation: Physics Simulation in Computer Graphics

Main motivation is to reduce the time it takes to create the first working prototype!

Example: Harmonic Oscillator

\begin{aligned} \dot x &= \frac{\partial H}{\partial p} \\ \dot p &= - \frac{\partial H}{\partial x} \end{aligned}

\begin{aligned} \dot x &= \frac{\partial H}{\partial p} \\ \dot p &= - \frac{\partial H}{\partial x} \end{aligned}

 def H (m k : ℝ) (x p : ℝ) := (1/(2*m)) * ∥p∥² + k/2 * ∥x∥²
 
approx solver (m k : ℝ) (steps : Nat)
  := odeSolve (λ t (x,p) => ( ∇ (p':=p), H m k x  p',
                             -∇ (x':=x), H m k x' p))
by
  -- Unfold Hamiltonian and compute gradients
  unfold H
  symdiff; symdiff
 
  -- Apply RK4 method
  rw [odeSolve_fixed_dt runge_kutta4_step]
  approx_limit steps; simp; intro steps';

 Goals (1)
m k :  ℝ
steps : ℕ
⊢ Approx (odeSolve fun t x => (1 / m * x.snd, -(k * x.fst)))

 Goals (1)
m k :  ℝ
steps : ℕ
⊢ Approx (odeSolve (λ t (x,p) => ( ∇ (p':=p), H m k x  p',
                                  -∇ (x':=x), H m k x' p)))

 Goals (1)
m k :  ℝ
steps steps' : ℕ
⊢ Approx (odeSolve_fixed_dt_impl steps' runge_kutta4_step 
			fun t x => (1 / m * x.snd, -(k * x.fst)))

 (x, p) := solver m k substeps t (x, p) Δt

Two Main Operations: Differential and Adjoint

Differential $\partial : (X \rightarrow Y) \rightarrow (X \rightarrow X \rightarrow Y)$

Adjoint $\dagger : (X \rightarrow Y) \rightarrow (Y \rightarrow X )$

\partial \, f\, x \, dx = \lim_{h\rightarrow 0} \frac{f (x + h \cdot dx) - f(x)}{h}

\partial \, f\, x \, dx = \lim_{h\rightarrow 0} \frac{f (x + h \cdot dx) - f(x)}{h}

$x:X$

$f$

$y:Y$

$x:X$

$\partial f$

$dy:Y$

$dx:X$

$\partial$

$x:X$

$f$

$y:Y$

$f$

$f^\dagger$

$x:X$

$\dagger$

\forall \, x \, y, \left\langle f(x), y \right\rangle = \left\langle x, f^\dagger (y) \right\rangle

\forall \, x \, y, \left\langle f(x), y \right\rangle = \left\langle x, f^\dagger (y) \right\rangle

Simplifier and Typeclass approach

Differential $\partial$

Adjoint $\dagger$

 instance comp_is_smooth (f : Y → Z) (g : X → Y) [IsSmoothT f] [IsSmoothT g]
  : IsSmoothT (λ x => f (g x)) := ...
 
@[simp]
theorem chain_rule (f : Y → Z) (g : X → Y) [IsSmoothT f] [IsSmoothT g]
  : ∂ (λ x => f (g x)) 
    = 
    λ x dx => ∂ f (g x) (∂ g x dx) := ...

 instance comp_has_adjoint (f : Y → Z) (g : X → Y) [IsLinT f] [IsLinT g]
  : IsLinT (λ x => f (g x)) := ...
 
@[simp]
theorem adj_of_comp (f : Y → Z) (g : X → Y) [IsLinT f] [IsLinT g] 
  : (λ x => f (g x))† 
  	=
    λ z => g† (f† z) := ...

Lambda calculus and SKI combinators

 instance I_is_smooth
  : IsSmoothT λ x : X => x := ...
 
@[simp] 
theorem diff_of_I
  : ∂ (λ x : X => x) 
    = 
    λ x dx => dx := ...

 instance K_is_smooth (x : X)
  : IsSmoothT λ (y : Y) => x := ...
 
@[simp] 
theorem diff_of_K
  : ∂ (λ y : Y => x) 
    = 
    λ y dy => (0 : X) := ...

 instance S_is_smooth (f : X → Y → Z) (g : X → Y) [IsSmoothNT 2 f] [IsSmoothT g]
  : IsSmoothT (λ x => f x (g x)) := ...
  
@[simp] 
theorem diff_of_S (f : X → Y → Z) (g : X → Y) [IsSmoothNT 2 f] [IsSmoothT g]
  : ∂ (λ x => f x (g x)) 
    = 
    λ x dx => 
      ∂ f x dx (g x) 
      + 
      ∂ (f x) (g x) (∂ g x dx) := ...

Differential $\partial$

I : X \textcolor{green}{⟿} Y

I : X \textcolor{green}{⟿} Y

K : X ⟿ Y \textcolor{green}{⟿} X

K : X ⟿ Y \textcolor{green}{⟿} X

S : (X \textcolor{red}{⟿} Y \textcolor{red}{⟿} Z) ⟿ (X \textcolor{red}{⟿} Y) ⟿ X \textcolor{green}{⟿} Z

S : (X \textcolor{red}{⟿} Y \textcolor{red}{⟿} Z) ⟿ (X \textcolor{red}{⟿} Y) ⟿ X \textcolor{green}{⟿} Z

X ⟿ Y = C^\infty(X,Y)

X ⟿ Y = C^\infty(X,Y)

I \, x = x

I \, x = x

K \, x \, y = x

K \, x \, y = x

S \, f \, g \, x = f \, x \,(g \, x)

S \, f \, g \, x = f \, x \,(g \, x)

Lambda calculus and SKI combinators

Differential $\partial$

 instance (priority := low) swap_is_smooth (f : α → X → Y) [∀ a, IsSmoothT (f a)] 
  : IsSmoothT (λ x a => f a x) := ...
  
@[simp low]
theorem diff_of_swap (f : α → X → Y) [∀ a, IsSmoothT (f a)]
  : ∂ (λ x a => f a x) 
    = 
    λ x dx a => ∂ (f a) x dx := ...

 instance swap'_is_smooth (f : X → α → Y) [IsSmoothT f] 
  : ∀ a, IsSmoothT (λ x => f x a) := ...
  
@[simp]
theorem diff_of_swap' (f : X → α → Y) [IsSmoothT f] (a : α)
  : ∂ (λ x => f x a) 
    = 
    λ x dx => ∂ f x dx a := ...

\prod : \prod_a (X_a \textcolor{red}{⟿} Y_a) \rightarrow \left( \prod_a X_a \right) \textcolor{green}{⟿} \left( \prod_a Y_a \right)

\prod : \prod_a (X_a \textcolor{red}{⟿} Y_a) \rightarrow \left( \prod_a X_a \right) \textcolor{green}{⟿} \left( \prod_a Y_a \right)

eval_a = \pi_a : \prod_{a'} X_{a'} \textcolor{green}{⟿} X_a

eval_a = \pi_a : \prod_{a'} X_{a'} \textcolor{green}{⟿} X_a

C : (\alpha \rightarrow X \textcolor{red}{⟿} Y) \rightarrow (X \textcolor{green}{⟿} \alpha \rightarrow Y )

C : (\alpha \rightarrow X \textcolor{red}{⟿} Y) \rightarrow (X \textcolor{green}{⟿} \alpha \rightarrow Y )

C' : (X \textcolor{red}{⟿} \alpha \rightarrow Y) \rightarrow (\alpha \rightarrow X \textcolor{green}{⟿} Y)

C' : (X \textcolor{red}{⟿} \alpha \rightarrow Y) \rightarrow (\alpha \rightarrow X \textcolor{green}{⟿} Y)

\prod_a X = \alpha \rightarrow X

\prod_a X = \alpha \rightarrow X

Lambda calculus and SKI combinators

Differential $\partial$

C : (\alpha \rightarrow X ⟿ Y) ⟿ \left( X ⟿\alpha \rightarrow Y \right)

C : (\alpha \rightarrow X ⟿ Y) ⟿ \left( X ⟿\alpha \rightarrow Y \right)

I : X ⟿ Y

I : X ⟿ Y

K : X ⟿ Y ⟿ X

K : X ⟿ Y ⟿ X

S : (X ⟿ Y ⟿ Z) ⟿ (X ⟿ Y) ⟿ X ⟿ Z

S : (X ⟿ Y ⟿ Z) ⟿ (X ⟿ Y) ⟿ X ⟿ Z

C : (\iota \rightarrow U \multimap V) \rightarrow \left( U \multimap \iota \rightarrow V \right)

C : (\iota \rightarrow U \multimap V) \rightarrow \left( U \multimap \iota \rightarrow V \right)

I : U \multimap U

I : U \multimap U

K : U \multimap \iota \rightarrow U

K : U \multimap \iota \rightarrow U

S : (U \times V \multimap W) \rightarrow (U \multimap V) \rightarrow U \multimap W

S : (U \times V \multimap W) \rightarrow (U \multimap V) \rightarrow U \multimap W

Adjoint $\dagger$

X ⟿ Y = C^\infty(X,Y) \\ X, Y, Z \text{ are Convenient vector spaces} \\ \alpha \text{ is arbitrary set}

X ⟿ Y = C^\infty(X,Y) \\ X, Y, Z \text{ are Convenient vector spaces} \\ \alpha \text{ is arbitrary set}

U \multimap V = L(U,V) \\ U, V, Z \text{ are Hilbert spaces} \\ \iota \text{ is finite set}

U \multimap V = L(U,V) \\ U, V, Z \text{ are Hilbert spaces} \\ \iota \text{ is finite set}

C' : (X ⟿ \alpha \rightarrow Y) ⟿ (\alpha \rightarrow X ⟿ Y)

C' : (X ⟿ \alpha \rightarrow Y) ⟿ (\alpha \rightarrow X ⟿ Y)

C' : (U \multimap \iota \rightarrow V) \rightarrow (\iota \rightarrow U \multimap V)

C' : (U \multimap \iota \rightarrow V) \rightarrow (\iota \rightarrow U \multimap V)

Forward Mode AD and Functoriality

$g$

$\partial$

$x:X$

$\partial (f\circ g)$

$dz:Z$

$dx:X$

$?$

$x:X$

$f$

$g$

$f$

$z:Z$

$\partial f$

$x:X$

$dx:X$

$\partial g$

$y:Y$

$dy:Y$

$dz:Z$

Forward Mode AD and Functoriality

 𝒯 (λ x => f (g x)) 
= 
λ x dx =>
  let (y,dy) := 𝒯 g x dx
  let (z,dz) := 𝒯 f y dy
  (z,dz)

$x:X$

$\mathcal{T} (f\circ g)$

$dz:Z$

$dx:X$

$z:Z$

$x:X$

$\mathcal{T} g$

$dy:Y$

$dx:X$

$y:Y$

$\mathcal{T} f$

$dz:Z$

$z:Z$

$x:X$

$f$

$g$

$f$

$z:Z$

Reverse Mode AD and Functoriality

$x:X$

$f$

$g$

$?$

$f$

$z:Z$

$x:X$

$\partial^\dagger (f\circ g)$

$dx:X$

$dz:Z$

$\partial^\dagger$

$x:X$

$\partial^\dagger g$

$dx:X$

$y:Y$

$\partial^\dagger f$

$dy:Y$

$dz:Z$

Reverse Mode AD and Functoriality

$x:X$

$f$

$g$

$?$

$f$

$z:Z$

$x:X$

$\mathcal{R} (f\circ g)$

$z:Z$

$\partial^\dagger (f\circ g) \, x : Z \rightarrow X$

 ℛ (λ x => f (g x)) 
    = 
    λ x => 
      let (y,dg') := ℛ g x
      let (z,df') := ℛ f y
      (z, λ dz => dg' (df' dz))

$\partial^\dagger$

$x:X$

$\mathcal{R} g$

$y:Y$

$\partial^\dagger g \, x : Y \rightarrow X$

$\mathcal{R} f$

$z:Z$

$\partial^\dagger f \, y : Z \rightarrow Y$

Forward Mode AD and Let Bindings

 ∂ (λ x => 
   let y := g x
   let z := f y
   x + y + z) 
rewrite_by autodiff

 fun x dx =>
  let y := g x
  let dy := ∂ g x dx
  let z := f y
  let dz := ∂ f y dy
  dx + dy + dz

=

Working prototype of differentiating through let bindings as a custom simplifier step. Currently, the step does not produce a proof.

Ideally use $\mathcal{T}$

 ∂ (λ x => 
   let y := g x
   let z := f y
   x + y + z) 
rewrite_by autodiff

 λ x dx => 
  let (y,dy) := 𝒯 g x dx
  let (z,dz) := 𝒯 f y dy
  dx + dy + dz

=

Differentiation Approach Overview

Two main operations $\partial$ and $\dagger$
- rules for I, K, S, C, C' and primitive functions
- their correctness needs to be proven using mathlib
Derived operations $\partial^\dagger, \mathcal{T}, \mathcal{R}$
- their rules can be easily derived from rules of $\partial$ and $\dagger$
- main purpose is for efficiency - faster symbolic computation and more efficient resulting code
Special handling of let bindings
- works for $\partial$ but not verified
- needs to be done for other operations
But there are problems ...

$\texttt{IsLinT}$ is Transitive Closure of $\texttt{IsLin}$

We want:
- $\texttt{IsLin } f$ to imply $\texttt{IsSmooth } f$
- $\partial f \, x \, dx = f \, dx$ for $f$ linear
Problem: Attempting to prove linearity on every function is too expensive!

Solution: Introduce a transitive closure $\texttt{IsLinT } f$ of $\texttt{IsLin } f$ via rules for I, K, S, C, C'.
The class $\texttt{IsLin } f$ is used only on elementary functions like $(\cdot + \cdot), (\cdot * \cdot), (- \cdot)$

Multiple Arguments

 example (f : X → Y → Z) [IsSmoothNT 2 f]
  : differential (λ (x,y) => f x y)
    =
    λ (x,y) (dx,dy) => 
      ∂ (λ x' => f x' y) x dx
      +
      ∂ (λ y' => f x y') y dy
    := by symdiff; done

Proving $\texttt{IsSmoothT (λ x => λ y ⟿ f x y)}$ can be difficult
This can't be done with linearity. We have special variants of I,K,S,C,C' rules for $\texttt{IsLinNT 2 f}$ , $\texttt{IsLinNT 3 f}$ , ...

 example (f : X → Y → Z) 
  [∀ x, IsSmoothT (λ y => f x y)] [IsSmoothT (λ x => λ y ⟿ f x y)] 
  : [IsSmoothNT 2 f] := ...

Smoothness w.r.t to $(x,y)$ can be reduced to smoothness w.r.t to $y$ and $x$ (with values in $Y ⟿ Z$ )

Differentiation w.r.t to $(x,y)$ can be reduced to differentiation w.r.t to $x$ and $y$

Unification Problems

Applying composition rule is difficult!

 instance comp_is_smooth (f : Y → Z) (g : X → Y) [IsSmoothT f] [IsSmoothT g]
  : IsSmoothT (λ x => f (g x)) := ...

Adding trailing argument trips unification

 example (f' : Y → α → Z) (g : X → Y) (a : α) [IsSmoothT (fun y => f' y a)] [IsSmoothT g] 
  : IsSmoothT (λ x => f' (g x) a) := by try infer_instance -- fails to solve
                                       apply comp_is_smooth (fun y => f' y a) g

 unif_hint (f : Y → Z) (f' : Y → α → Z) (g : X → Y) (a : α) where
  f =?= λ y => f' y a
  |-
  IsSmoothT (λ x => f (g x)) =?= IsSmoothT (λ x => f' (g x) a)

Solved by unification hint

It is not over ... and it goes on and on ...

 example (f : Y → α → β → Z) (g : X → Y) (a : α) (b : β) 
  [IsSmoothT (fun y => f y a b)] [IsSmoothT g] 
  : IsSmoothT (λ x => f (g x) a b) := by infer_instance -- still fails!

Simp Guard

 [Meta.Tactic.simp.rewrite] @SciLean.comp.arg_x.adj_simp:100, SciLean.adjoint fun x =>
      f (g x) a ==> fun z => SciLean.adjoint (fun x => g x) (SciLean.adjoint (fun x => f x a) z)
[Meta.Tactic.simp.rewrite] @SciLean.comp.arg_x.adj_simp:100, SciLean.adjoint fun x =>
      f x a ==> fun z => SciLean.adjoint (fun x => x) (SciLean.adjoint (fun x => f x a) z)
[Meta.Tactic.simp.rewrite] @SciLean.comp.arg_x.adj_simp:100, SciLean.adjoint fun x =>
      f x a ==> fun z => SciLean.adjoint (fun x => x) (SciLean.adjoint (fun x => f x a) z)
...

 example (...) :
  (λ x => f (g x) a)†
  =
  λ x' => g† ((λ x => f x a)† x')
  := by simp  -- infinite loop

 unif_hint (...) where
  f =?= λ x => f' x a
  |- 
  (λ x => f (g x))† =?= (λ x => f' (g x) a)†

An unification hint can cause an infinite loop

Simp Guard

 @[simp, simp_guard g (λ x => x)]
theorem comp.arg_x.adj_simp
  (f : Y → Z) [HasAdjointT f] 
  (g : X → Y) [HasAdjointT g] 
  : (λ x => f (g x))† = λ z => g† (f† z) := ...

Solution: $\texttt{simp\_guard g (λ x => x)}$

do not apply simp rule if $\texttt{g}$ is equal to $\texttt{λ x => x}$

Problematic Adjoints

K : U \rightarrow (\iota \rightarrow U)

K : U \rightarrow (\iota \rightarrow U)

\text{eval}_i : (\iota \rightarrow U) \rightarrow U

\text{eval}_i : (\iota \rightarrow U) \rightarrow U

\sum^\dagger u = K u = \lambda \, i, u

\sum^\dagger u = K u = \lambda \, i, u

\text{eval}_i^\dagger \, u = \lambda \, j, u \, \delta_{ij}

\text{eval}_i^\dagger \, u = \lambda \, j, u \, \delta_{ij}

Creating constant arrays is wasteful!

Creating almost everywhere zero arrays is wasteful!

\sum : (\iota \rightarrow U) \rightarrow U

\sum : (\iota \rightarrow U) \rightarrow U

\cdot \, \delta_{i\cdot} : U \rightarrow (\iota \rightarrow U)

\cdot \, \delta_{i\cdot} : U \rightarrow (\iota \rightarrow U)

Problematic Adjoints and Unification

Matrix transposition

 example {n m} (A : Fin n → Fin m → ℝ) : 
  (λ (x : Fin m → ℝ) => λ i => ∑ j, A i j * x j)†
  =
  (λ y => ∑ j i', λ i => [[i'=i]] * A j i' * y j) := by symdiff; done

\left(\lambda\, x \, i, \sum_j A_{ij} x_j \right)^\dagger = \lambda\, y \, i, \sum_j A_{ji} y_j = \lambda \, y \, i, \sum_j \sum_{i'} \delta_{ii'} A_{ji'} y_j

\left(\lambda\, x \, i, \sum_j A_{ij} x_j \right)^\dagger = \lambda\, y \, i, \sum_j A_{ji} y_j = \lambda \, y \, i, \sum_j \sum_{i'} \delta_{ii'} A_{ji'} y_j

 @[simp]
theorem adjoint_sum_eval
  (f : ι → κ → X → Y) [∀ i j, HasAdjointT (f i j)]
  : (λ (x : κ → X) => λ i => ∑ j, (f i j) (x j))†
    =
    λ y => λ j => ∑ i, (f i j)† (y i) := ...

Ok, this is somewhat hard. Let's add a new simp rule:

Problematic Adjoints and Unification

\left(\lambda\, x \, i, \sum_j A_{ij} x_j \right)^\dagger = \lambda\, y \, i, \sum_j A_{ji} y_j

\left(\lambda\, x \, i, \sum_j A_{ij} x_j \right)^\dagger = \lambda\, y \, i, \sum_j A_{ji} y_j

\left(\lambda\, x \, i, \sum_j x_j A_{ij} \right)^\dagger = \lambda \, y \, i, \sum_j \sum_{i'} \delta_{ii'} y_j A_{ji'}

\left(\lambda\, x \, i, \sum_j x_j A_{ij} \right)^\dagger = \lambda \, y \, i, \sum_j \sum_{i'} \delta_{ii'} y_j A_{ji'}

Now we have:

Ups

 unif_hint 
  (f? : ι → κ → X → Y) 
  (f : ι → κ → X → α → Y) (g : ι → κ → α)
where
  f? =?= λ i j x => f i j x (g i j)
  |-
  (λ (x : κ → X) => λ i => ∑ j, (f? i j) (x j))† 
  =?= 
  (λ (x : κ → X) => λ i => ∑ j, f i j (x j) (g i j))†

Let's add an unification hint:

Problematic Adjoints and Unification

\left(\lambda\, x \, i, \sum_j A_{ij} x_j \right)^\dagger = \lambda\, y \, i, \sum_j A_{ji} y_j

\left(\lambda\, x \, i, \sum_j A_{ij} x_j \right)^\dagger = \lambda\, y \, i, \sum_j A_{ji} y_j

\left(\lambda\, x \, i, \sum_j x_j A_{ij} \right)^\dagger = \lambda \, y \, i, \sum_j y_j A_{ji}

\left(\lambda\, x \, i, \sum_j x_j A_{ij} \right)^\dagger = \lambda \, y \, i, \sum_j y_j A_{ji}

Now we have:

 unif_hint 
  (f? : ι → κ → X → Y) 
  (f : ι → κ → W → α → Y) (g : ι → κ → α) (h : ι → κ → X → W)
where
  f? =?= λ i j x => f i j (h i j x) (g i j)
  |-
  (λ (x : κ → X) => λ i => ∑ j, (f? i j) (x j))† 
  =?= 
  (λ (x : κ → X) => λ i => ∑ j, f i j (h i j (x j)) (g i j))†

Let's add an unification hint:

Ups

\left(\lambda\, x \, i, \sum_j 2 x_j A_{ij} \right)^\dagger = \lambda \, y \, i, \sum_j \sum_i' \delta_{ii'} 2 y_j A_{ji'}

\left(\lambda\, x \, i, \sum_j 2 x_j A_{ij} \right)^\dagger = \lambda \, y \, i, \sum_j \sum_i' \delta_{ii'} 2 y_j A_{ji'}

Problematic Adjoints and Unification

\left(\lambda\, x \, i, \sum_j A_{ij} x_j \right)^\dagger = \lambda\, y \, i, \sum_j A_{ji} y_j

\left(\lambda\, x \, i, \sum_j A_{ij} x_j \right)^\dagger = \lambda\, y \, i, \sum_j A_{ji} y_j

\left(\lambda\, x \, i, \sum_j x_j A_{ij} \right)^\dagger = \lambda \, y \, i, \sum_j y_j A_{ji}

\left(\lambda\, x \, i, \sum_j x_j A_{ij} \right)^\dagger = \lambda \, y \, i, \sum_j y_j A_{ji}

Now we have:

\left(\lambda\, x \, i, \sum_j 2 x_j A_{ij} \right)^\dagger = \lambda \, y \, i, \sum_j 2 y_j A_{ji'}

\left(\lambda\, x \, i, \sum_j 2 x_j A_{ij} \right)^\dagger = \lambda \, y \, i, \sum_j 2 y_j A_{ji'}

I give up

\left(\lambda\, x \, i, \sum_j (A_{ij} x_j + B_{ij} x_j) \right)^\dagger = \lambda\, y \, i, \sum_j \sum_{i'} \left(\delta_{ii'} A_{ji} y_j + \delta_{ii'} B_{ji} y_j\right)

\left(\lambda\, x \, i, \sum_j (A_{ij} x_j + B_{ij} x_j) \right)^\dagger = \lambda\, y \, i, \sum_j \sum_{i'} \left(\delta_{ii'} A_{ji} y_j + \delta_{ii'} B_{ji} y_j\right)

Calculus of Variations

\begin{aligned} \min_x \int \mathcal{L}(t, x(t), \dot x(t)) \, dt \end{aligned}

\begin{aligned} \min_x \int \mathcal{L}(t, x(t), \dot x(t)) \, dt \end{aligned}

\begin{aligned} \min_\phi\int \left\| \nabla \phi(x) \right\|^2 \, dx \end{aligned}

\begin{aligned} \min_\phi\int \left\| \nabla \phi(x) \right\|^2 \, dx \end{aligned}

\begin{aligned} &\max_p - \int p(x) \log p(x)\, dx \\ &\text{such that:} \\ &\forall x, \, p(x) \geq 0 \qquad \int p(x) \, dx = 1 \\ &\int x p(x) \, dx = \mu \qquad \int (x-\mu)^2 p(x) \, dx = \sigma^2 \end{aligned}

\begin{aligned} &\max_p - \int p(x) \log p(x)\, dx \\ &\text{such that:} \\ &\forall x, \, p(x) \geq 0 \qquad \int p(x) \, dx = 1 \\ &\int x p(x) \, dx = \mu \qquad \int (x-\mu)^2 p(x) \, dx = \sigma^2 \end{aligned}

\begin{aligned} p(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac12\left(\frac{x-\mu}{\sigma}\right)^2} \end{aligned}

\begin{aligned} p(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac12\left(\frac{x-\mu}{\sigma}\right)^2} \end{aligned}

\rightarrow

\rightarrow

\Delta \phi = 0

\Delta \phi = 0

\rightarrow

\rightarrow

\frac{d}{dt}\frac{\partial \mathcal{L}}{\partial \dot x} - \frac{\partial \mathcal{L}}{\partial x} = 0

\frac{d}{dt}\frac{\partial \mathcal{L}}{\partial \dot x} - \frac{\partial \mathcal{L}}{\partial x} = 0

\rightarrow

\rightarrow

Adjunction on Semi-Hilbert Spaces

\begin{aligned} \langle f' , g\rangle = \int f' g = - \int f g' = \langle f , -g'\rangle \end{aligned}

\begin{aligned} \langle f' , g\rangle = \int f' g = - \int f g' = \langle f , -g'\rangle \end{aligned}

\left( \begin{matrix} a_{00} & a_{01} & \dots \\ a_{10} & a_{11} & \dots \\ \vdots & \vdots & \ddots \end{matrix} \right)^\dagger = \left( \begin{matrix} a_{00} & a_{10} & \dots \\ a_{01} & a_{11} & \dots \\ \vdots & \vdots & \ddots \end{matrix} \right)

\left( \begin{matrix} a_{00} & a_{01} & \dots \\ a_{10} & a_{11} & \dots \\ \vdots & \vdots & \ddots \end{matrix} \right)^\dagger = \left( \begin{matrix} a_{00} & a_{10} & \dots \\ a_{01} & a_{11} & \dots \\ \vdots & \vdots & \ddots \end{matrix} \right)

Adjunction of $A : (\mathbb{N}\rightarrow \R) \rightarrow (\mathbb{N}\rightarrow \R)$

Sensible only if the matrix has only finitely many nonzero elements in every column and row

Adjunction of $\frac{d}{dx} :(\mathbb{\R}⟿ \mathbb{R}) \rightarrow (\mathbb{\R}⟿ \mathbb{R})$

\left(\frac{d}{dx}\right)^\dagger = - \frac{d}{dx}

\left(\frac{d}{dx}\right)^\dagger = - \frac{d}{dx}

True only for $f$ and $g$ compactly supported and when integrating over large enough domain.

Adjoint Failure of K, C and eval rules

Adjoint failure of K:

K : U \rightarrow (\iota \rightarrow U)

K : U \rightarrow (\iota \rightarrow U)

K\dagger = \sum : (\iota \rightarrow U) \rightarrow U

K\dagger = \sum : (\iota \rightarrow U) \rightarrow U

K : U \rightarrow (\mathbb{R}^n ⟿ U)

K : U \rightarrow (\mathbb{R}^n ⟿ U)

K^\dagger \textcolor{red}{= \int : (\mathbb{R}^n ⟿ U) \rightarrow U}

K^\dagger \textcolor{red}{= \int : (\mathbb{R}^n ⟿ U) \rightarrow U}

Adjoint failure of eval:

eval_x : (\mathbb{R}^n ⟿ U) \rightarrow U

eval_x : (\mathbb{R}^n ⟿ U) \rightarrow U

\text{eval}_x^\dagger \textcolor{red}{= \cdot \, \delta_x(\cdot) : U \rightarrow (\mathbb{R}^n ⟿ U) }

\text{eval}_x^\dagger \textcolor{red}{= \cdot \, \delta_x(\cdot) : U \rightarrow (\mathbb{R}^n ⟿ U) }

\text{eval}_i : (\iota \rightarrow U) \rightarrow U

\text{eval}_i : (\iota \rightarrow U) \rightarrow U

\text{eval}_i^\dagger = \cdot \, \delta_{i\cdot} : U \rightarrow (\iota \rightarrow U)

\text{eval}_i^\dagger = \cdot \, \delta_{i\cdot} : U \rightarrow (\iota \rightarrow U)

As a consequence we do not have C, C' (can't take an adjoint of the green arrow)

C : (\mathbb{R}^n ⟿ X \textcolor{red}{\multimap} Y) \rightarrow \left(  X \textcolor{green}{\multimap} \mathbb{R}^n ⟿ Y \right)

C : (\mathbb{R}^n ⟿ X \textcolor{red}{\multimap} Y) \rightarrow \left( X \textcolor{green}{\multimap} \mathbb{R}^n ⟿ Y \right)

but we still have the product rule

\Pi : (\mathbb{R}^n ⟿ X \textcolor{red}{\multimap} Y) \rightarrow (\mathbb{R}^n ⟿ X) \textcolor{green}{\multimap} (\mathbb{R}^n ⟿ Y)

\Pi : (\mathbb{R}^n ⟿ X \textcolor{red}{\multimap} Y) \rightarrow (\mathbb{R}^n ⟿ X) \textcolor{green}{\multimap} (\mathbb{R}^n ⟿ Y)

Two Main Goals for Adjunction

  
example {n k} (x : Fin n → ℝ)
    : (λ (w : Fin k → ℝ) => λ (i : Fin n) => ∑ j, w j * x (i + j) )†
      =
       λ (y : Fin n → ℝ) => λ (j : Fin k) => ∑ i, x i * y (i - j) := ...

Convolution:

Mixing $f(x)$ and $f'(x)$

 example (p q : ℝ ⟿ ℝ) 
  : (λ (f : ℝ ⟿ ℝ) => λ x ⟿ p x * f x + q x * ⅆ f x)†
     =
    (λ (f : ℝ ⟿ ℝ) => λ x ⟿ p x * f x - ⅆ (x':=x), q x' * f x') := ...

p(x)f(x) + q(x)f'(x)\qquad \rightarrow \qquad p(x)f(x) - \left( q(x) f(x)\right)'

p(x)f(x) + q(x)f'(x)\qquad \rightarrow \qquad p(x)f(x) - \left( q(x) f(x)\right)'

Manifold Like Types

Types like $\mathbb{R}^+, \texttt{Array X}, \texttt{X} \oplus \texttt{Y}, \{ x : X, \|x\| = 1 \}, N \leftrightsquigarrow M$ are not vector spaces!

We need tangent spaces:

$\mathcal{T}_a (\texttt{Array X}) = \{ \texttt{b : Array X, b.size = a.size}\}$
$\mathcal{T}_{x} (\texttt{X} \oplus \texttt{Y}) = \texttt{X}$ $\mathcal{T}_{y} (\texttt{X} \oplus \texttt{Y}) = \texttt{Y}$
$\mathcal{T}_x \{ x : X, \|x\| = 1 \} = \{ v : X, v \perp x \}$
$\mathcal{T}_f (N ⟿ M) = (x : N) \rightsquigarrow \mathcal{T}_{f(x)} M$

What is the good mathematical semantics? Diffeological spaces?

\partial : (x : N) \rightarrow \mathcal{T}_{x} N \rightarrow \mathcal{T}_{f(x)} M

\partial : (x : N) \rightarrow \mathcal{T}_{x} N \rightarrow \mathcal{T}_{f(x)} M

How to Handle Imperative Code?

 ∂ λ (x₀ : ℝ^{n}) => Id.run do
  let mut x := x₀
  for i in [0:10] do
    x := x + Δt * f x 
  x

 λ (x₀ dx₀: ℝ^{n}) => Id.run do
  let mut  x :=  x₀
  let mut dx := dx₀
  for i in [0:10] do
    (x,dx) := (x,dx) + Δt * 𝒯 f x dx
  dx

?

 λ (x₀ dx₀: ℝ^{n}) => Id.run do
  let mut  x :=  x₀
  let mut dx := dx₀
  for i in [0:10] do
    dx := dx + Δt * ∂ f x dx
    x  := x + Δt * f x 
  dx

Differentiating Monadic Code

\begin{aligned} \text{Hom}(X, Y) = X \rightsquigarrow Y \end{aligned}

\begin{aligned} \text{Hom}(X, Y) = X \rightsquigarrow Y \end{aligned}

\begin{aligned} \text{Hom}(X, Y) = X \rightsquigarrow (Y \times (X \multimap Y)) \end{aligned}

\begin{aligned} \text{Hom}(X, Y) = X \rightsquigarrow (Y \times (X \multimap Y)) \end{aligned}

Type

Type

DType

DType

Kleisli

Kleisli

\begin{aligned} \text{Hom}(X, Y) = X \rightarrow m\, Y \end{aligned}

\begin{aligned} \text{Hom}(X, Y) = X \rightarrow m\, Y \end{aligned}

DKleisli

DKleisli

\begin{aligned} \text{Hom}(X, Y) = X \rightarrow m\, \left(Y \times (X \rightarrow m\, Y) \right) \end{aligned}

\begin{aligned} \text{Hom}(X, Y) = X \rightarrow m\, \left(Y \times (X \rightarrow m\, Y) \right) \end{aligned}

 (f∘g) := λ x =>
           let (y, dg) := g x
           let (z, df) := f y
           (z, df∘dg)

?

 (f∘g) := λ x => do let y ← g x; f y

 ℱ f x = (f x, λ dx => ∂ f x dx)

 pure

 pure

 (f∘g) := λ x => do
  let (y,dg) ← g x
  let (z,df) ← f y
  let dfg := λ dx => do 
    let dy ← dg dx
    df dy
  pure (z, dfg)

Maybe Kan extension?

Local Differentiability and Reduced Regularity

\| x \|

\| x \|

Non-smooth functions that we want to differentiate:

\frac1x

\frac1x

Potential approaches:

Smooth approximation
- $\|x\| \approx \|x\|_\epsilon = \sqrt{ \|x\|^2 + \epsilon^2 }$
- $\frac1x \approx \frac{x}{\|x\|^2_\epsilon}$
Local smoothness, new predicate: $\texttt{IsSmoothAt} f \, \, x$
- type class inference would need to automatically prove for example that $x \neq 0$
Work in category with maps $\text{Lip}$ , $\text{C}^k$ or $\text{C}^{k,\alpha}$ .
- Are these categories Cartesian closed?
- Does some form of chain rule hold for Lipschitz functions?

Symbolic and Automatic Differentiation in Lean

deck

deck

lecopivo

	def H (m k : ℝ) (x p : ℝ) := (1/(2m)) ∥p∥² + k/2 * ∥x∥²

	approx solver (m k : ℝ) (steps : Nat)
	:= odeSolve (λ t (x,p) => ( ∇ (p':=p), H m k x p',
	-∇ (x':=x), H m k x' p))
	by
	-- Unfold Hamiltonian and compute gradients
	unfold H
	symdiff; symdiff

	-- Apply RK4 method
	rw [odeSolve_fixed_dt runge_kutta4_step]
	approx_limit steps; simp; intro steps';

	Goals (1)
	m k : ℝ
	steps : ℕ
	⊢ Approx (odeSolve fun t x => (1 / m * x.snd, -(k * x.fst)))

	Goals (1)
	m k : ℝ
	steps steps' : ℕ
	⊢ Approx (odeSolve_fixed_dt_impl steps' runge_kutta4_step
	fun t x => (1 / m * x.snd, -(k * x.fst)))

	instance comp_is_smooth (f : Y → Z) (g : X → Y) [IsSmoothT f] [IsSmoothT g]
	: IsSmoothT (λ x => f (g x)) := ...

	@[simp]
	theorem chain_rule (f : Y → Z) (g : X → Y) [IsSmoothT f] [IsSmoothT g]
	: ∂ (λ x => f (g x))
	=
	λ x dx => ∂ f (g x) (∂ g x dx) := ...

	instance comp_has_adjoint (f : Y → Z) (g : X → Y) [IsLinT f] [IsLinT g]
	: IsLinT (λ x => f (g x)) := ...

	@[simp]
	theorem adj_of_comp (f : Y → Z) (g : X → Y) [IsLinT f] [IsLinT g]
	: (λ x => f (g x))†
	=
	λ z => g† (f† z) := ...

	instance I_is_smooth
	: IsSmoothT λ x : X => x := ...

	@[simp]
	theorem diff_of_I
	: ∂ (λ x : X => x)
	=
	λ x dx => dx := ...

	instance K_is_smooth (x : X)
	: IsSmoothT λ (y : Y) => x := ...

	@[simp]
	theorem diff_of_K
	: ∂ (λ y : Y => x)
	=
	λ y dy => (0 : X) := ...

	instance S_is_smooth (f : X → Y → Z) (g : X → Y) [IsSmoothNT 2 f] [IsSmoothT g]
	: IsSmoothT (λ x => f x (g x)) := ...

	@[simp]
	theorem diff_of_S (f : X → Y → Z) (g : X → Y) [IsSmoothNT 2 f] [IsSmoothT g]
	: ∂ (λ x => f x (g x))
	=
	λ x dx =>
	∂ f x dx (g x)
	+
	∂ (f x) (g x) (∂ g x dx) := ...

	instance (priority := low) swap_is_smooth (f : α → X → Y) [∀ a, IsSmoothT (f a)]
	: IsSmoothT (λ x a => f a x) := ...

	@[simp low]
	theorem diff_of_swap (f : α → X → Y) [∀ a, IsSmoothT (f a)]
	: ∂ (λ x a => f a x)
	=
	λ x dx a => ∂ (f a) x dx := ...

	instance swap'_is_smooth (f : X → α → Y) [IsSmoothT f]
	: ∀ a, IsSmoothT (λ x => f x a) := ...

	@[simp]
	theorem diff_of_swap' (f : X → α → Y) [IsSmoothT f] (a : α)
	: ∂ (λ x => f x a)
	=
	λ x dx => ∂ f x dx a := ...

	𝒯 (λ x => f (g x))
	=
	λ x dx =>
	let (y,dy) := 𝒯 g x dx
	let (z,dz) := 𝒯 f y dy
	(z,dz)

	ℛ (λ x => f (g x))
	=
	λ x =>
	let (y,dg') := ℛ g x
	let (z,df') := ℛ f y
	(z, λ dz => dg' (df' dz))

	∂ (λ x =>
	let y := g x
	let z := f y
	x + y + z)
	rewrite_by autodiff

	fun x dx =>
	let y := g x
	let dy := ∂ g x dx
	let z := f y
	let dz := ∂ f y dy
	dx + dy + dz

	example (f : X → Y → Z) [IsSmoothNT 2 f]
	: differential (λ (x,y) => f x y)
	=
	λ (x,y) (dx,dy) =>
	∂ (λ x' => f x' y) x dx
	+
	∂ (λ y' => f x y') y dy
	:= by symdiff; done

	example (f : X → Y → Z)
	[∀ x, IsSmoothT (λ y => f x y)] [IsSmoothT (λ x => λ y ⟿ f x y)]
	: [IsSmoothNT 2 f] := ...

	example (f' : Y → α → Z) (g : X → Y) (a : α) [IsSmoothT (fun y => f' y a)] [IsSmoothT g]
	: IsSmoothT (λ x => f' (g x) a) := by try infer_instance -- fails to solve
	apply comp_is_smooth (fun y => f' y a) g

	unif_hint (f : Y → Z) (f' : Y → α → Z) (g : X → Y) (a : α) where
	f =?= λ y => f' y a
	\|-
	IsSmoothT (λ x => f (g x)) =?= IsSmoothT (λ x => f' (g x) a)

	example (f : Y → α → β → Z) (g : X → Y) (a : α) (b : β)
	[IsSmoothT (fun y => f y a b)] [IsSmoothT g]
	: IsSmoothT (λ x => f (g x) a b) := by infer_instance -- still fails!

	[Meta.Tactic.simp.rewrite] @SciLean.comp.arg_x.adj_simp:100, SciLean.adjoint fun x =>
	f (g x) a ==> fun z => SciLean.adjoint (fun x => g x) (SciLean.adjoint (fun x => f x a) z)
	[Meta.Tactic.simp.rewrite] @SciLean.comp.arg_x.adj_simp:100, SciLean.adjoint fun x =>
	f x a ==> fun z => SciLean.adjoint (fun x => x) (SciLean.adjoint (fun x => f x a) z)
	[Meta.Tactic.simp.rewrite] @SciLean.comp.arg_x.adj_simp:100, SciLean.adjoint fun x =>
	f x a ==> fun z => SciLean.adjoint (fun x => x) (SciLean.adjoint (fun x => f x a) z)
	...

Symbolic and Automatic Differentiation in Lean

deck

More from lecopivo

	example (...) :
	(λ x => f (g x) a)†
	=
	λ x' => g† ((λ x => f x a)† x')
	:= by simp -- infinite loop

	unif_hint (...) where
	f =?= λ x => f' x a
	\|-
	(λ x => f (g x))† =?= (λ x => f' (g x) a)†

	@[simp, simp_guard g (λ x => x)]
	theorem comp.arg_x.adj_simp
	(f : Y → Z) [HasAdjointT f]
	(g : X → Y) [HasAdjointT g]
	: (λ x => f (g x))† = λ z => g† (f† z) := ...

	example {n m} (A : Fin n → Fin m → ℝ) :
	(λ (x : Fin m → ℝ) => λ i => ∑ j, A i j * x j)†
	=
	(λ y => ∑ j i', λ i => [[i'=i]] * A j i' * y j) := by symdiff; done

	@[simp]
	theorem adjoint_sum_eval
	(f : ι → κ → X → Y) [∀ i j, HasAdjointT (f i j)]
	: (λ (x : κ → X) => λ i => ∑ j, (f i j) (x j))†
	=
	λ y => λ j => ∑ i, (f i j)† (y i) := ...

	unif_hint
	(f? : ι → κ → X → Y)
	(f : ι → κ → X → α → Y) (g : ι → κ → α)
	where
	f? =?= λ i j x => f i j x (g i j)
	\|-
	(λ (x : κ → X) => λ i => ∑ j, (f? i j) (x j))†
	=?=
	(λ (x : κ → X) => λ i => ∑ j, f i j (x j) (g i j))†

	unif_hint
	(f? : ι → κ → X → Y)
	(f : ι → κ → W → α → Y) (g : ι → κ → α) (h : ι → κ → X → W)
	where
	f? =?= λ i j x => f i j (h i j x) (g i j)
	\|-
	(λ (x : κ → X) => λ i => ∑ j, (f? i j) (x j))†
	=?=
	(λ (x : κ → X) => λ i => ∑ j, f i j (h i j (x j)) (g i j))†


	example {n k} (x : Fin n → ℝ)
	: (λ (w : Fin k → ℝ) => λ (i : Fin n) => ∑ j, w j * x (i + j) )†
	=
	λ (y : Fin n → ℝ) => λ (j : Fin k) => ∑ i, x i * y (i - j) := ...

	example (p q : ℝ ⟿ ℝ)
	: (λ (f : ℝ ⟿ ℝ) => λ x ⟿ p x * f x + q x * ⅆ f x)†
	=
	(λ (f : ℝ ⟿ ℝ) => λ x ⟿ p x * f x - ⅆ (x':=x), q x' * f x') := ...

	∂ λ (x₀ : ℝ^{n}) => Id.run do
	let mut x := x₀
	for i in [0:10] do
	x := x + Δt * f x
	x