Robert Salomone
Centre for Data Science, Queensland University of Technology (QUT)
"I use mathematical ideas in creative ways to help improve methods in statistics and machine learning. A large part of this is making computers work more reliably and efficiently for very difficult problems."
That was the very big picture, this talk is a (slightly) smaller picture - overview of my research contributions.
Top 5: Each paper, small amount of background, followed by discussion of things accomplished.
Recurring Theme: Insights and ideas from diverse areas (maths, statistics, machine learning) help solve problems and develop new methodology.
Discussion of some other areas where similar ideas pop up.
My Goal (research dissemination aside): Inspire you to think a little about how your insights from different fields may be useful.
\[\nu^N(\varphi) := \sum_{k=1}^N W_k \varphi(x_k).\]
Everything today is based on this idea - how to construct \(\nu^N\) to overcome specific challenges. \[ \] \[ \]
\[\nu^N(\varphi) = \frac{1}{N}\sum_{k=1}^N \varphi(X_k). \]
(Markov Chain Monte Carlo) Simulate an ergodic Markov Chain \(\{\mathbf{X}_t\}_{t=0}^N\) with stationary distribution \(\pi\). Take iterates after some burn-in period of length \(b\) that ensures the chain is sufficiently close to stationarity, \[ \]
\[\nu^N(\varphi) = \frac{1}{N-b}\sum_{k=b}^N \varphi(X_k).\]
Many ways to do this, e.g., Metropolis-Hastings.
Generally, a computationally expensive approach.
Salomone R., Quiroz, M., Kohn, R., Villani, M., and Tran, M.N., (2020),
Spectral Subsampling MCMC for Stationary Time Series.
(Contribution) Extending Bayesian methods for large datasets to stationary time series.
Published in International Conference in Machine Learning (ICML), (2020)
Villani, M., Quiroz, M., Kohn, R., and Salomone, R. (2021), Spectral Subsampling MCMC for Stationary Multivariate Time Series.
Recently Submitted (Invited Paper)
(TFI does very well in terms of BIC)
\[{\bf X}_0 \sim \nu, \quad {\rm d}{\bf X}_t = \frac{1}{2}\nabla \log \pi({\bf X}_t){\rm d}t + {\rm d}{\bf W}_t, \]
has stationary distribution \(\pi\).
has stationary distribution \(\nu \ne \pi\) (or none at all!).
Hodgkinson, L., Salomone,R., and Roosta, F. (2019), Implicit Langevin Algorithms for Sampling From Log-concave Densities.
Hodgkinson, L., Salomone,R., and Roosta, F. (2019), Implicit Langevin Algorithms for Sampling From Log-concave Densities.
Accepted at Journal of Machine Learning Research
Contributions
Implicit Discretization of \(\pi\)-ergodic Langevin SDE:
Unadjusted Algorithm (Asymptotically Biased)
Next Up: Estimating \(\mathcal{Z}\) and sampling from pathologically shaped distributions.
\(\pi(\mathbf{x}) = \eta(\mathbf{x}){\mathcal{L}(\mathbf{x}})/{\mathcal{Z}} \)
\(\mathcal{Z} = \mathbb{E}_{\eta}{\mathcal L}(\mathbf{X}) = \int_0^\infty \mathbb{P}_{\eta}({\mathcal{L}(\mathbf{X}) \ge l}){\rm d}l\)
\(0\)
\(1\)
Main Idea: Perform Quadrature to solve...
1. Draw \(X_1, \ldots, X_n \sim_{{\rm iid}} \eta\)
2. Until Termination:
Nested Sampling Procedure
Popular as it can handle many pathological likelihoods and problems (e.g., Potts Model)
where \[w(\mathbf{x}) \propto \frac{\pi(\mathbf{x})}{\eta({\mathbf{x}})} \]
(similar formula to Importance Sampling)
where \[w(\mathbf{x}) \propto \frac{\pi(\mathbf{x})}{\eta({\mathbf{x}})} \]
Suggests using: \[\nu^N(\varphi) := \sum_{k=1}^N W_k \varphi(X_k), \quad \mathbf{X}_k \sim \eta\] with
\[W_k = \frac{w(\mathbf{X}_k)}{\sum_{j=1}^N w(\mathbf{X}_j)}\]
\(\{\eta_t\} \) : Constrained Priors
\(\{\pi_t\}\): Constrained Posteriors
Salomone, R., South, L.F., A.M. Johansen, Drovandi, C.C., and Kroese, D.P., Unbiased and Consistent Nested Sampling via Sequential Monte Carlo.
Salomone, R., South, L.F., A.M. Johansen, Drovandi, C.C., and Kroese, D.P., Unbiased and Consistent Nested Sampling via Sequential Monte Carlo.
Long awaited revision almost complete, featuring consistency for the adaptive variant!
Resubmission Pending
(Dependent) Sum of Lognormals distribution
\(Y = \sum_k \exp(X_k), \quad \mathbf{X} \sim {\mathcal{N}}(\boldsymbol{\mu}, \Sigma). \)
Challenge: Distribution is intractable (even in iid case).
Published in Annals of Operations Research
Botev, Z.I., Salomone, R., Mackinlay, D., (2019), Fast and accurate computation of the distribution of sums of dependent log-normals.
Contributions
\[\mathbb{E}\mathcal{A}_\pi \phi(\mathbf{X}) = 0 \Longleftrightarrow \text{Law}(\mathbf{X})\equiv \pi \]
where \(k_\pi\) is a reproducing Stein kernel for \(\pi \), involving terms with \(\nabla \log \pi(\cdot)\).
Add kernel methods to the mix...
Hodgkinson, L., Salomone,R., and Roosta, F., (2021) The reproducing Stein kernel approach for post-hoc corrected sampling.
Hodgkinson, L., Salomone,R., and Roosta, F., (2021) The reproducing Stein kernel approach for post-hoc corrected sampling.
Main Contribution: Convergence of Stein Importance Sampling for \(\nu\)-ergodic Markov chains.
Don't necessarily need \(\nu = \pi\)!
Even allows for subsampling!
Provides theoretical justification for post-hoc correction of unadjusted samplers.
"Best of Both Worlds": Good finite-time behaviour, with good asymptotic properties.