Youth in high dimensions 2022
Mario Geiger
Postdoc at MIT with Prof. Smidt
input
output
Illustration of a neural network equivariant to rotations in 3D
What affects data efficiency in equivariant neural networks?
\(a, b, c, e \in G\)
The Vectors
\(\begin{bmatrix}x_1\\x_2\\x_3 \end{bmatrix}\longrightarrow R \begin{bmatrix}x_1\\x_2\\x_3 \end{bmatrix} \)
Few examples
The Vectors
\(\begin{bmatrix}x_1\\x_2\\x_3 \end{bmatrix}\longrightarrow R \begin{bmatrix}x_1\\x_2\\x_3 \end{bmatrix} \)
The Scalars
\(x\longrightarrow x\)
Few examples
The Vectors
\(\begin{bmatrix}x_1\\x_2\\x_3 \end{bmatrix}\longrightarrow R \begin{bmatrix}x_1\\x_2\\x_3 \end{bmatrix} \)
The Scalars
\(x\longrightarrow x\)
Signal on the Sphere
\(f: S^2\to \mathbb{R}\)
\(f'(x)=f(R^{-1}x)\)
Few examples
The Vectors
\(\begin{bmatrix}x_1\\x_2\\x_3 \end{bmatrix}\longrightarrow R \begin{bmatrix}x_1\\x_2\\x_3 \end{bmatrix} \)
The Scalars
\(x\longrightarrow x\)
Scalar Field
\( f: \mathbb{R}^3 \to \mathbb{R}\)
\(f'(x)=f(R^{-1}x)\)
Signal on the Sphere
\(f: S^2\to \mathbb{R}\)
\(f'(x)=f(R^{-1}x)\)
Few examples
\((\rho, V)\)
\(\rho:G \to (V\to V)\) \(g,g_1,g_2 \in G\) \(x, y \in V\)
The Vectors
\(\begin{bmatrix}x_1\\x_2\\x_3 \end{bmatrix}\longrightarrow R \begin{bmatrix}x_1\\x_2\\x_3 \end{bmatrix} \)
The Scalars
\(x\longrightarrow x\)
Scalar Field
\( f: \mathbb{R}^3 \to \mathbb{R}\)
\(f'(x)=f(R^{-1}x)\)
Signal on the Sphere
\(f: S^2\to \mathbb{R}\)
\(f'(x)=f(R^{-1}x)\)
The Vectors
\(\begin{bmatrix}x_1\\x_2\\x_3 \end{bmatrix}\longrightarrow R \begin{bmatrix}x_1\\x_2\\x_3 \end{bmatrix} \)
The Scalars
\(x\longrightarrow x\)
Scalar Field
\( f: \mathbb{R}^3 \to \mathbb{R}\)
\(f'(x)=f(R^{-1}x)\)
Signal on the Sphere
\(f: S^2\to \mathbb{R}\)
\(f'(x)=f(R^{-1}x)\)
irreducible
irreducible
reducible
reducible
Scalar Field
\( f: \mathbb{R}^3 \to \mathbb{R}\)
\(f'(x)=f(R^{-1}x)\)
reducible
=
\(c_1 \times\)
\(c_2 \times\)
\(c_3 \times\)
\(c_4 \times\)
\(c_5 \times\)
irreducible
\(c_6 \times\)
Index | Name | Examples of quantities |
---|---|---|
L=0 | Scalars | temperature, norm of a vector, orbital s, ... |
L=1 | Vectors | velocity, force, orbital p, ... |
L=2 | orbital d | |
L=3 | orbital f | |
L=4 | orbital g | |
L=5 | ... | |
L=6 | ||
L=7 | ||
L=8 | ||
L=9 | ||
L=10 | ||
L=11 |
Index | Name | Examples of quantities |
---|---|---|
L=0 | Scalars | temperature, norm of a vector, orbital s, ... |
L=1 | Vectors | velocity, force, orbital p, ... |
L=2 | orbital d | |
L=3 | orbital f | |
L=4 | orbital g | |
L=5 | ... | |
L=6 | ||
L=7 | ||
L=8 | ||
L=9 | ||
L=10 | ||
L=11 |
Stress Tensor
(3x3 matrix)
\(\}\)
\(\sigma\longrightarrow R\sigma R^T\)
Everything can be decomposed into irreps:
\(\rho_1 \otimes \rho_2\) is a representation
acting on the vector space \(V_1 \otimes V_2\)
\(X \in \mathbb{R}^{\dim V_1\times\dim V_2}\)
\(X \longrightarrow \rho_1(g) X \rho_2(g)^T \)
\(\rho_1 \otimes \rho_2\) is a representation
acting on the vector space \(V_1 \otimes V_2\)
\(X \in \mathbb{R}^{\dim V_1\times\dim V_2}\)
\(X \longrightarrow \rho_1(g) X \rho_2(g)^T \)
(\(X_{ij} \longrightarrow \rho_1(g)_{ik}\rho_2(g)_{jl} X_{kl} \))
reducible
=
direct sum of
irreducible
\(\rho_1 \otimes \rho_2\)
\(\rho_3 \oplus \rho_4 \oplus \rho_4\)
\(G\)
\(\rho_1\)
\(\rho_2\)
\(\rho_3\)
\(\rho_4\)
\(\rho_5\)
\(\otimes\)
\(\rho_5\)
\(\rho_1\)
\(\rho_2\)
Example:
\(D_2 \otimes D_1 = D_1 \oplus D_2 \oplus D_3\)
\(D_L\) is the irreps of order L
\(D_L\) is the irreps of order L
General formula:
\(D_j \otimes D_k = D_{|j-k|} \oplus \dots \oplus D_{j+k}\)
Example:
\(D_2 \otimes D_1 = D_1 \oplus D_2 \oplus D_3\)
Using the tools presented previously you can create any equivariant polynomials
Equivariant
Polynomial
\(\theta\)
\(\rho_1\)
\(\rho_2\)
\(\rho_2\)
\(\rho_3\)
\(\rho_1\)
\(\rho_1\)
\(\rho_2\)
\(\rho_3\)
\(\rho_1\)
\(\rho_2\)
\(\rho_4\)
\(\rho_4\)
\(\rho_1\)
\(\otimes\)
\(\otimes\)
\(\otimes\)
\(\oplus\)
\(\oplus\)
\(\oplus\)
\(\oplus\)
\(\otimes\)
\(\otimes\)
\(\otimes\)
\(\oplus\)
\(\oplus\)
\(\oplus\)
\(\oplus\)
\(\theta\)
\(\theta\)
Group | Name | Ref |
---|---|---|
Translation | Convolutional Neural Networks | |
90 degree rotation 2D | Group Equivariant CNN | 1602.07576 |
2D Rotations | Harmonic Networks | 1612.04642 |
2D Scale | Deep Scale-spaces | 1905.11697 |
3D Rotations | 3D Steerable CNN, Tensor Field Network | 1807.02547 1802.08219 |
Lorentz | Lorentz Group Equivariant NN | 2006.04780 |
We wrote python code to help creating Equivariant Neural Networks
$ pip install e3nn
We wrote python code to help creating Equivariant Neural Networks
$ pip install e3nn
import e3nn e3nn.o3.spherical_harmonics(2, x, True)
Spherical Harmonics are Equivariant Polynomials
(TFN: Nathaniel Thomas et al. 2018)
(Nequip: Simon Batzner et al. 2021)
source
dest.
\(h\)
\(\vec r\)
\(m = h \otimes Y(\vec r)\)
\(m\)
* this formula is missing the parameterized radial function
(Nequip: Simon Batzner et al. 2021)
max L of the messages
\(P =\) size of trainset
\(d =\) dimension of the data
\(\delta =\) distance to closest neighbor
Bach (2017)
\(P =\) size of trainset
\(d =\) dimension of the data
\(\delta =\) distance to closest neighbor
\(\epsilon =\) test error
Hestness et al. (2017)
regression + Lipschitz continuous
Luxburg and Bousquet (2004)
(MACE: Ilyes Batatia et al. 2022)
source
\(1\)
dest.
\(h_1\)
\(\vec r_1\)
\(m = F_\theta(\{h_i\otimes Y(\vec r_i)\}_{i=1}^\nu)\)
\(m\)
source
\(2\)
source
\(\nu\)
\(h_2\)
\(h_\nu\)
\(\vec r_2\)
\(\vec r_\nu\)
(MACE: Ilyes Batatia et al. 2022)
L
L
3
\(m = F_\theta(\{h_i\otimes Y(\vec r_i)\}_{i=1}^\nu)\)
any L and \(\nu=1\)
\(h \otimes Y(\vec r)\)
L=0 and \(\nu=2\)
\(h_1Y(\vec r_1) \cdot h_2Y(\vec r_2)\)
Legendre polynomials
L=0 and \(\nu=3\)
\((h_1Y(\vec r_1) \otimes h_2Y(\vec r_2)) \cdot h_3Y(\vec r_3)\)
any L and \(\nu=3\)
\(h_1\otimes Y(\vec r_1) \otimes h_2\otimes Y(\vec r_2) \otimes h_3\otimes Y(\vec r_3)\)
Equivariant Neural Networks are more data efficient if they incorporate Tensor Products of order \(L \geq 1\)
but not necessary as features (MACE)
The slides are available at
https://slides.com/mariogeiger/youth2022