Inference and learning in an interactive dSprite environment
Part I
Motivation
Develop probabilistic models
of behavior capable of handling complex interactive environments.
Inspiration in modern probabilistic machine learning.
High dimensional interpretable latent variables capturing perception, planning and action.
Qian, Xuelin, et al. "fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for Multi-Subject Brain Activity Decoding." arXiv preprint arXiv:2311.00342 (2023).
Going beyond fMRI to Image.
Develop probabilistic models
of behavior capable of handling complex interactive environments.
Inspiration in modern probabilistic machine learning.
High dimensional interpretable latent variables capturing perception, planning and action.
Develop probabilistic models
of behavior capable of handling complex interactive environments.
Inspiration in modern probabilistic machine learning.
High dimensional interpretable latent variables capturing perception, planning and action.
Outline
- dSprites dataset
- Variational autoencoders
- Model inversion
- Remaining work
Disentanglement testing Sprites dataset
dSprites is a dataset of 2D shapes procedurally generated from 6 latent factors:
- Color: white
- Shape: square, ellipse, heart
- Scale: 6 values linearly spaced in [0.5, 1]
- Orientation: 40 values in [0, 2 pi]
- Position X: 32 values in [0, 1]
- Position Y: 32 values in [0, 1]
Higgins et al. "beta-VAE: Learning basic visual concepts with a constrained variational framework." In Proceedings of the International Conference on Learning Representations (ICLR). 2017.
Interactive dSprites environment
Transfrom dSprites dataset into an
interactive environment with
movements along 4 latent factors:
- Scale
- Orientation
- Position X
- Position Y
https://github.com/dimarkov/active-dsprites
Possible extensions
Multi-color multi-object environments
Continuous latent spaces?
Adding speed and acceleration to objects?
Variational autoencoders
ML way of doing Bayesian predictive coding.
Marino, Joseph. "Predictive coding, variational autoencoders, and biological connections." Neural Computation 34.1 (2022): 1-44.
\(\hat{x}^n\)
\( x^n\)
\( z^n \)
Encoder
Decoder
Variational autoencoders
Variational free energy or negative ELBO
Amortized inference
Variational autoencoders
Two problems with amortized inference:
- Amortization gap
- Non-biological: evidence of gradient descent like behaviour in measured neuronal responses.
Friston, Karl. "A theory of cortical responses." Philosophical transactions of the Royal Society B: Biological sciences 360.1456 (2005): 815-836.
Variational autoencoders
Marino, Joe, Yisong Yue, and Stephan Mandt. "Iterative amortized inference." International Conference on Machine Learning. PMLR, 2018.
\( \phi_n^{(k+1)} = f\left(\phi_n^{(k)}, \nabla_{\phi_n^{(k)}} \hat{F}_n, \pmb{W} \right) \)
\( \phi_n^{(k + 1)} = \phi_n^{(k)} + \beta_t \nabla_{\phi_n^{(k)}} \hat{F}_n \)
Stochastic Gradient descent
Learnable optimization algorithm
Iterative amortized inference
\( \phi_n = (\mu_n, \sigma_n) \)
Generative model
Requirements:
- Interpretability of latent states:
- Beliefs about objects
- Beliefs about manipulations
- Attention
- Linear transformations and dynamics
Generative model
Spatial Transformer Networks
Jaderberg, Max, Karen Simonyan, and Andrew Zisserman. "Spatial transformer networks." Advances in neural information processing systems 28 (2015).
Generative model
Spatial Transformer Networks
Jaderberg, Max, Karen Simonyan, and Andrew Zisserman. "Spatial transformer networks." Advances in neural information processing systems 28 (2015).
+ bilinear interpolation
Generative model
Spatial Transformer Networks
Generative model
Amortized inference
Iterative inference
Results
Amortized inference
Results
Iterative inference
Khan, Mohammad Emtiyaz, and Håvard Rue. "The Bayesian learning rule." arXiv preprint arXiv:2107.04562 (2021).
Khan, Mohammad, et al. "Fast and scalable bayesian deep learning by weight-perturbation in adam." International conference on machine learning. PMLR, 2018.
Natural momentum for natural gradient SVI
Iterative inference
Iterative inference
Results
Iterative inference
Iterative inference
Iterative inference
Remaining work
Representing latent state dynamics
Learning of A, B, and L.
Action selection via expected free energy minimization.
Remaining work
Integration with Bayesian sparse-coding
Hierarchical extensions for complex object representation and learning.
Position/Orientation/Scale
deck
By dimarkov
deck
- 36