Pierre Ablin and Florian Yger
Data : \(x_1,\dots, x_n\)
Algorithm: parametrized map
$$f_{\theta}: x\to y$$
(linear model, neural network,...)
\(\mathcal{M}\) high-dimensional "surface" that locally resembles a vector space.
Endow vector space with a metric.
Examples:
- Sphere \(S^{p-1}= \{x\in\mathbb{R}^p| \enspace \|x\|=1\}\)
- Orthogonal matrices $$\mathcal{O}_p = \{X\in \mathbb{R}^{p\times p}| \enspace X^\top X = I_p\}$$
- Positive matrices \(S^{++}_p\)
ML Algorithm: parametrized map
Parameters \(\theta\) on manifold
Training algorithms on manifolds
Data \(x\) on manifold
Design algorithms to handle data that lives on manifolds
Trained without care, a neural network is susceptible to adversarial attacks
Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples."
Idea : The map \(x\mapsto Wx\) is norm preserving when \(W^\top W = I_p\)
We can stack such transforms to get networks such that
$$\|f_{\theta}(x + \delta) - f_{\theta}(x)\| \leq \|\delta\|$$
Pinot, R., Meunier, L., Araujo, A., Kashima, H., Yger, F., Gouy-Pailler, C., & Atif, J. Theoretical evidence for adversarial robustness through randomization.
Training = optimization
Training with orthogonal weights = optimization on a manifold
Well established field but deep learning brings a new context:
Contribution:
Fast training of neural nets with orthogonal weights
Ablin, P. and Peyré, G. Fast and accurate optimization on the orthogonal manifold without retraction. AISTATS 2022
EEG signals are usually :
Riemannian & Euclidean geometries for 2x2 PSD matrices
Covariance matrices are now used as EEG features
Yger, F., Berar, M., Lotte, F., Riemannian approaches in Brain-Computer Interfaces: a review
Corsi, MC., Yger, F., Chevallier, S., RIGOLETTO-- A contribution to the Clinical BCI Challenge--WCCI2020
Averaging on manifold
$$\min_{S \in \mathcal{P}_n } \sum_i \delta_R^2 (\Sigma_i,S)$$
leads to simple (yet efficient) MDM classifiers
Yger, F., Chevallier, S., Barthélemy, Q., Sra. S. Geodesically-convex optimization for averaging partially observed covariance matrices
Averaging on manifold with missing columns
$$\min_{S \in \mathcal{P}_n } \sum_i \delta_R^2 (M_i^\top\Sigma_i M_i,M_i^\top S M_i)$$
But when the signals from some sensors are missing, things are different ...
Trajectories to model
$$\Sigma^1$$
$$\Sigma^m$$
Part of a WP in a submitted ANR project