Dimitrije Marković
DySCO meeting 18.01.2023
Optimization
Inference
Murphy, Kevin P. Probabilistic machine learning: an introduction. MIT press, 2022.
Wilson, Andrew Gordon. "The case for Bayesian deep learning." arXiv preprint arXiv:2001.10995 (2020).
Bui, Thang D., et al. "Partitioned variational inference: A unified framework encompassing federated and continual learning." arXiv preprint arXiv:1811.11206 (2018).
Murphy, Kevin P. Probabilistic machine learning: Advanced Topics. MIT Press 2023
Optimization
Nalisnick, Eric, José Miguel Hernández-Lobato, and Padhraic Smyth. "Dropout as a structured shrinkage prior." International Conference on Machine Learning. PMLR, 2019.
Ghosh, Soumya, Jiayu Yao, and Finale Doshi-Velez. "Structured variational learning of Bayesian neural networks with horseshoe priors." International Conference on Machine Learning. PMLR, 2018.
Dropout as a spike-and-slab prior
Better shrinkage priors
Piironen, Juho, and Aki Vehtari. "Sparsity information and regularization in the horseshoe and other shrinkage priors." Electronic Journal of Statistics 11.2 (2017): 5018-5051.
Approximate posterior
Hierarchical model
Approximate posterior
Hierarchical model
Variational free energy
Stochastic gradient
Two generative processes for the data
flat model
extended model
Friston, Karl, Thomas Parr, and Peter Zeidman. "Bayesian model reduction." arXiv preprint arXiv:1805.07092 (2018).
flat model
extended model
Step 1
Step 2
New epoch
\( p(\pmb{z}_i) \propto \exp\left[ \int d \pmb{z}_{i+1} p_{i|i+1}q_{i+1} \right] \)
step 1
\(\vdots\)
step 2
Linear (D=(1,100), N=100)
Logistic (D=(1,100), N=200)
Multinomial (D=(10,10), N=400)
Normal likelihood
Bernoulli likelihood
Categorical likelihood
Normal and Bernoulli likelihoods
Categorical likelihood
TODO: UCI Machine learning repository
label | N | D |
---|---|---|
Yacht | 308 | 6 |
Boston | 506 | 13 |
Energy | 768 | 8 |
Concrete | 1030 | 8 |
Wine | 1599 | 11 |
Kin8nm | 8192 | 8 |
Power Plant | 9568 | 4 |
Naval | 11,934 | 16 |
Protein | 45,730 | 9 |
Year | 515,345 | 90 |
TODO:
Fashion MNIST
BMR seems to work great and has great potential for a range of deep learning applications.
https://github.com/dimarkov/numpc
Might be possible to prune pre-trained models using Laplace approximation.
But ... we can do better and use BMR to formulate Bayesian sparse predictive coding models.
Naturally complements distributed and federated inference problems.