Learning with Riemannian manifolds
Pierre Ablin and Florian Yger
Machine learning
Data : \(x_1,\dots, x_n\)
Algorithm: parametrized map
$$f_{\theta}: x\to y$$
(linear model, neural network,...)
Riemannian manifolds
\(\mathcal{M}\) high-dimensional "surface" that locally resembles a vector space.
Endow vector space with a metric.
Examples:
- Sphere \(S^{p-1}= \{x\in\mathbb{R}^p| \enspace \|x\|=1\}\)
- Orthogonal matrices $$\mathcal{O}_p = \{X\in \mathbb{R}^{p\times p}| \enspace X^\top X = I_p\}$$
- Positive matrices \(S^{++}_p\)
Well studied mathematical objects
Manifolds in machine learning
ML Algorithm: parametrized map
$$f_{\theta}: \enspace x\to y$$
Parameters \(\theta\) on manifold
- To impose a prior
- For stability
- For robustness...
Training algorithms on manifolds
Data \(x\) on manifold
- Geometric data
- Structured data
- ...
Design algorithms to handle data that lives on manifolds
Training ML models with parameters on a manifold
One motivation: robust neural networks
Trained without care, a neural network is susceptible to adversarial attacks
Given data \(x\), a small perturbation \(\delta\) such that \(\|f_{\theta}(x + \delta) - f_{\theta}(x)\|\gg \|\delta\|\)
Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples."
Certified robustness with orthogonal weights
Idea : The map \(x\mapsto Wx\) is norm preserving when \(W^\top W = I_p\)
We can stack such transforms to get networks such that
$$\|f_{\theta}(x + \delta) - f_{\theta}(x)\| \leq \|\delta\|$$
Certified robustness !
Pinot, R., Meunier, L., Araujo, A., Kashima, H., Yger, F., Gouy-Pailler, C., & Atif, J. Theoretical evidence for adversarial robustness through randomization.
Training a network with orthogonal weights
Training = optimization
Training with orthogonal weights = optimization on a manifold
Well established field but deep learning brings a new context:
- Hardware: use of GPU's
- Massive data: need to use stochastic algorithms
Contribution:
Fast training of neural nets with orthogonal weights
Ablin, P. and Peyré, G. Fast and accurate optimization on the orthogonal manifold without retraction. AISTATS 2022
Training ML models with data on a manifold
Use-case in BCI
EEG signals are usually :
- multidimensional
- noisy
Riemannian & Euclidean geometries for 2x2 PSD matrices
Covariance matrices are now used as EEG features
Yger, F., Berar, M., Lotte, F., Riemannian approaches in Brain-Computer Interfaces: a review
Corsi, MC., Yger, F., Chevallier, S., RIGOLETTO-- A contribution to the Clinical BCI Challenge--WCCI2020
- golden standard
- simpler pipelines
- more robust models
Missing data on manifolds
Averaging on manifold
$$\min_{S \in \mathcal{P}_n } \sum_i \delta_R^2 (\Sigma_i,S)$$
leads to simple (yet efficient) MDM classifiers
Yger, F., Chevallier, S., Barthélemy, Q., Sra. S. Geodesically-convex optimization for averaging partially observed covariance matrices
Averaging on manifold with missing columns
$$\min_{S \in \mathcal{P}_n } \sum_i \delta_R^2 (M_i^\top\Sigma_i M_i,M_i^\top S M_i)$$
But when the signals from some sensors are missing, things are different ...
Projects and
Future Works
Manifolds & Deep Learning
- Develop deep architectures that are robust using orthogonal layers
- Fast training of such networks on modern hardware
Optimal transport meets manifolds
Trajectories to model
- frequency information
$$\{\Sigma^{f_m}, \cdots, \Sigma^{f_m} \}$$ - non-stationarity
$$\{\Sigma^{t_m}, \cdots, \Sigma^{t_m} \}$$
$$\Sigma^1$$
$$\Sigma^m$$
Part of a WP in a submitted ANR project
Thanks !
learning on manifolds
By Pierre Ablin
learning on manifolds
- 380