Group Theory for Machine Learning

Youth in high dimensions 2022

Mario Geiger

Postdoc at MIT with Prof. Smidt

This Talk is about Equivariant Neural Networks

input

output

Illustration of a neural network equivariant to rotations in 3D

Plan

What affects data efficiency in equivariant neural networks?

Group

\(a, b, c, e \in G\)

  • \((ab)c = a(bc)\)
  • \(ea=ae=a\)
  • \((a^{-1})a=a (a^{-1})=e\)

Representations of Rotations

The Vectors
 

\(\begin{bmatrix}x_1\\x_2\\x_3 \end{bmatrix}\longrightarrow R \begin{bmatrix}x_1\\x_2\\x_3 \end{bmatrix} \)

Few examples

Representations of Rotations

The Vectors
 

\(\begin{bmatrix}x_1\\x_2\\x_3 \end{bmatrix}\longrightarrow R \begin{bmatrix}x_1\\x_2\\x_3 \end{bmatrix} \)

The Scalars

 

\(x\longrightarrow x\)

Few examples

Representations of Rotations

The Vectors
 

\(\begin{bmatrix}x_1\\x_2\\x_3 \end{bmatrix}\longrightarrow R \begin{bmatrix}x_1\\x_2\\x_3 \end{bmatrix} \)

The Scalars

 

\(x\longrightarrow x\)

Signal on the Sphere

 

\(f: S^2\to \mathbb{R}\)

\(f'(x)=f(R^{-1}x)\) ​

Few examples

Representations of Rotations

The Vectors
 

\(\begin{bmatrix}x_1\\x_2\\x_3 \end{bmatrix}\longrightarrow R \begin{bmatrix}x_1\\x_2\\x_3 \end{bmatrix} \)

The Scalars

 

\(x\longrightarrow x\)

Scalar Field

 

\( f: \mathbb{R}^3 \to \mathbb{R}\)

\(f'(x)=f(R^{-1}x)\) ​

Signal on the Sphere

 

\(f: S^2\to \mathbb{R}\)

\(f'(x)=f(R^{-1}x)\) ​

Few examples

Group Representations

\((\rho, V)\)

\(\rho:G \to (V\to V)\)      \(g,g_1,g_2 \in G\)    \(x, y \in V\)

  • \(\rho(g)(x+\alpha y) = \rho(g)(x) + \alpha \rho(g)(y)\)
  • \(\rho(g_2)(\rho(g_1)(x)) = \rho(g_2 g_1)(x) \)

The Vectors
 

\(\begin{bmatrix}x_1\\x_2\\x_3 \end{bmatrix}\longrightarrow R \begin{bmatrix}x_1\\x_2\\x_3 \end{bmatrix} \)

The Scalars

 

\(x\longrightarrow x\)

Scalar Field

 

\( f: \mathbb{R}^3 \to \mathbb{R}\)

\(f'(x)=f(R^{-1}x)\) ​

Signal on the Sphere

 

\(f: S^2\to \mathbb{R}\)

\(f'(x)=f(R^{-1}x)\) ​

Irreducible Representations

The Vectors
 

\(\begin{bmatrix}x_1\\x_2\\x_3 \end{bmatrix}\longrightarrow R \begin{bmatrix}x_1\\x_2\\x_3 \end{bmatrix} \)

The Scalars

 

\(x\longrightarrow x\)

Scalar Field

 

\( f: \mathbb{R}^3 \to \mathbb{R}\)

\(f'(x)=f(R^{-1}x)\) ​

Signal on the Sphere

 

\(f: S^2\to \mathbb{R}\)

\(f'(x)=f(R^{-1}x)\) ​

irreducible

irreducible

reducible

reducible

Irreducible Representations

Scalar Field

 

\( f: \mathbb{R}^3 \to \mathbb{R}\)

\(f'(x)=f(R^{-1}x)\) ​

reducible

Irreducible Representations

=

\(c_1 \times\)

\(c_2 \times\)

\(c_3 \times\)

\(c_4 \times\)

\(c_5 \times\)

irreducible

\(c_6 \times\)

Irreps of Rotations

Index Name Examples of quantities 
L=0 Scalars temperature, norm of a vector, orbital s, ...
L=1 Vectors velocity, force, orbital p, ...
L=2 orbital d
L=3 orbital f
L=4 orbital g
L=5 ...
L=6
L=7
L=8
L=9
L=10
L=11

Irreps of Rotations

Index Name Examples of quantities 
L=0 Scalars temperature, norm of a vector, orbital s, ...
L=1 Vectors velocity, force, orbital p, ...
L=2 orbital d
L=3 orbital f
L=4 orbital g
L=5 ...
L=6
L=7
L=8
L=9
L=10
L=11

Stress Tensor
(3x3 matrix)

\(\}\)

\(\sigma\longrightarrow R\sigma R^T\)

Everything can be decomposed into irreps:

Tensor Product

\(\rho_1 \otimes \rho_2\) is a representation

acting on the vector space \(V_1 \otimes V_2\)

 

\(X \in \mathbb{R}^{\dim V_1\times\dim V_2}\)

\(X \longrightarrow \rho_1(g) X \rho_2(g)^T \)

Tensor Product

\(\rho_1 \otimes \rho_2\) is a representation

acting on the vector space \(V_1 \otimes V_2\)

 

\(X \in \mathbb{R}^{\dim V_1\times\dim V_2}\)

\(X \longrightarrow \rho_1(g) X \rho_2(g)^T \)

(\(X_{ij} \longrightarrow \rho_1(g)_{ik}\rho_2(g)_{jl} X_{kl} \))

Tensor Product

reducible

=

direct sum of

irreducible

\(\rho_1 \otimes \rho_2\)

\(\rho_3 \oplus \rho_4 \oplus \rho_4\)

Tensor Product

\(G\)

\(\rho_1\)

\(\rho_2\)

\(\rho_3\)

\(\rho_4\)

\(\rho_5\)

\(\otimes\)

\(\rho_5\)

\(\rho_1\)

\(\rho_2\)

Tensor Product of Rotations

Example:

\(D_2 \otimes D_1 = D_1 \oplus D_2 \oplus D_3\)

\(D_L\) is the irreps of order L

Tensor Product of Rotations

\(D_L\) is the irreps of order L

General formula:

\(D_j \otimes D_k = D_{|j-k|} \oplus \dots \oplus D_{j+k}\)

Example:

\(D_2 \otimes D_1 = D_1 \oplus D_2 \oplus D_3\)

Equivariant Neural Network

Using the tools presented previously you can create any equivariant polynomials

Equivariant
Polynomial

\(\theta\)

\(\rho_1\)

\(\rho_2\)

\(\rho_2\)

\(\rho_3\)

\(\rho_1\)

\(\rho_1\)

\(\rho_2\)

\(\rho_3\)

\(\rho_1\)

\(\rho_2\)

\(\rho_4\)

\(\rho_4\)

\(\rho_1\)

\(\otimes\)

\(\otimes\)

\(\otimes\)

\(\oplus\)

\(\oplus\)

\(\oplus\)

\(\oplus\)

\(\otimes\)

\(\otimes\)

\(\otimes\)

\(\oplus\)

\(\oplus\)

\(\oplus\)

\(\oplus\)

Equivariant Neural Network

\(\theta\)

\(\theta\)

Equivariant Neural Networks Architectures

Group Name Ref
Translation Convolutional Neural Networks
90 degree rotation 2D Group Equivariant CNN 1602.07576
2D Rotations Harmonic Networks 1612.04642
2D Scale Deep Scale-spaces 1905.11697
3D Rotations 3D Steerable CNN, Tensor Field Network 1807.02547
1802.08219
Lorentz Lorentz Group Equivariant NN 2006.04780

Library to make ENN for Rotations

We wrote python code to help creating Equivariant Neural Networks

$ pip install e3nn

We wrote python code to help creating Equivariant Neural Networks

$ pip install e3nn

Library to make ENN for Rotations

import e3nn
e3nn.o3.spherical_harmonics(2, x, True)

Spherical Harmonics are Equivariant Polynomials

Graph Convolution

Nequip

(TFN: Nathaniel Thomas et al. 2018)

(Nequip: Simon Batzner et al. 2021)

source

dest.

\(h\)

\(\vec r\)

\(m = h \otimes Y(\vec r)\)

\(m\)

* this formula is missing the parameterized radial function

Nequip Learning Curve

(Nequip: Simon Batzner et al. 2021)

max L of the messages

The Curse of Dimensionality

and the Learning Curve

\(P =\) size of trainset

\(d =\) dimension of the data

\(\delta =\) distance to closest neighbor

Bach (2017)

The Curse of Dimensionality

and the Learning Curve

\(P =\) size of trainset

\(d =\) dimension of the data

\(\delta =\) distance to closest neighbor

\(\epsilon =\) test error

Hestness et al. (2017)

regression + Lipschitz continuous

Luxburg and Bousquet (2004)

MACE

(MACE: Ilyes Batatia et al. 2022)

source

\(1\)

dest.

\(h_1\)

\(\vec r_1\)

\(m = F_\theta(\{h_i\otimes Y(\vec r_i)\}_{i=1}^\nu)\)

\(m\)

source

\(2\)

source

\(\nu\)

\(h_2\)

\(h_\nu\)

\(\vec r_2\)

\(\vec r_\nu\)

MACE

(MACE: Ilyes Batatia et al. 2022)

L

L

3

\(m = F_\theta(\{h_i\otimes Y(\vec r_i)\}_{i=1}^\nu)\)

any L and \(\nu=1\)

\(h \otimes Y(\vec r)\)

L=0 and \(\nu=2\)

\(h_1Y(\vec r_1) \cdot h_2Y(\vec r_2)\)

Legendre polynomials

L=0 and \(\nu=3\)

\((h_1Y(\vec r_1) \otimes h_2Y(\vec r_2)) \cdot h_3Y(\vec r_3)\)

any L and \(\nu=3\)

\(h_1\otimes Y(\vec r_1) \otimes h_2\otimes Y(\vec r_2) \otimes h_3\otimes Y(\vec r_3)\)

Kind of operations in MACE

Conclusion

Equivariant Neural Networks are more data efficient if they incorporate Tensor Products of order \(L \geq 1\)

but not necessary as features (MACE)

Thanks for listening

The slides are available at
https://slides.com/mariogeiger/youth2022

Group Theory and Machine Learning

By Mario Geiger

Group Theory and Machine Learning

  • 551