Dimension Free Matrix Tail Bounds and Applications
Min-Hsiu Hsieh, UTS
with C. Zhang and D. Tao [arXiv: 1910.03718]
Tail Bounds
Let \(X_1,\cdots,X_n\) be independent, zero-mean r.v. so that \(|X_i|\leq L\) for all \(i\).
Then \(Y=\sum_{i=1}^n X_i\) satisfies
Matrix Tail Bounds
Let \(\mathbf{X}_1,\cdots,\mathbf{X}_n\) be independent, zero-mean, \(d_1\times d_2\) random matrices so that \(\|\mathbf{X}_i\|\leq L\) for all \(i\).
Then \(\mathbf{Y}=\sum_{i=1}^n \mathbf{X}_i\) satisfies
Tropp. User-friendly tail bounds for sums of random matrices, Found. Comput. Math., Aug 2011.
(*)
Discussion
1. The bound contains matrix dimensions \(d_1+d_2\).
2. The first proof was provided by Ahlswede & Winter in 2002.
3. D. Gross also has another version for matrix completion in 2011.
4. Lieb's concavity theorem is used in (*).
Background
Hermitian Dilation
Standard Matrix Function
Warm Up
(Laplace Transform Method)
Golden-Thompson
Ahlswede and Winter. IEEE Trans. Inform. Theory, 48(3):569-579, 2002.
Lieb's Concavity Thm
\(\mathbf{A} \mapsto \text{Tr} \,e^{\mathbf{H}+\log\mathbf{A}}\) is concave.
Tropp. User-friendly tail bounds for sums of random matrices, Found. Comput. Math., Aug 2011.
\(\mathbb{E}_X\text{Tr}e^{\textcolor{blue}{\mathbf{H}}+\textcolor{red}{\mathbf{X}}} \leq \text{Tr} \,e^{\textcolor{blue}{\mathbf{H}}+\textcolor{red}{\log\mathbb{E}_X e^\mathbf{X}}}\)
To Obtain (*), the last ingredient is
(**)
Issues of (**)
1. Not suitable for high dimensional or infinite dimensional matrices.
2. Only applicable to spectral norm.
(**)
Our Improvement
2. \(\phi=O(U^K)\), where \(\mathbb{E} \mu({\bf X}_k) \leq U\) for all \(k\).
1. \(\mu:\mathbb{M}\to \mathbb{R}\) satisfies
(i) \(\mu(\mathbf{A})\geq 0\)
(ii) \(\mu(\theta\mathbf{A})=\theta\mu(\mathbf{A})\)
(iii) \(\mu(\mathbf{A}+\mathbf{B})\leq\mu(\mathbf{A})+\mu(\mathbf{B})\)
Discussion
1. Instead of \(d_1+d_2\), we have \(e^{O(U^K)}\), where \(\mathbb{E} \mu({\bf X}_k) \leq U\).
2. The matrix function \(\mu(\cdot)\) can be chosen to be any matrix norm and others.
(**)
K=10
K=20
Random Hermitian matrices with \(d=200\) and each entry obeys \(\mathcal{N}(0,1)\).
Numerics
Expectation Bound
\(\phi=O(U^K)\) and \(\max_k\mathbb{E} \mu({\bf X}_k) \leq U\).
Tropp. An introduction to matrix concentration inequalities. Foundations and Trends in Machine Learning 8, 1-2 (2015), 1–230.
Expectation Bound can be used to analyze
-
Matrix Approximation
-
Matrix Sparsification
-
Matrix Multiplication
Tropp. An introduction to matrix concentration inequalities. Foundations and Trends in Machine Learning 8, 1-2 (2015), 1–230.
Matrix Random Series
Tropp. An introduction to matrix concentration inequalities. Foundations and Trends in Machine Learning 8, 1-2 (2015), 1–230.
where \(\{\xi_i\}\) are independent random variables and \(\{\mathbf{A}_i\}\) are fixed matrices.
Example: Gaussian Wigner Matrices, Matrix Rademacher Series
Applications of Matrix Random Series include
-
Optimization
-
Sample Complexity
Azuma–Hoeffding Inequality
A sequence \(\{X_1,X_2\cdots,\}\) is a martingale if
A matrix martingale \(\{\mathbf{X}_1,\mathbf{X}_2\cdots,\}\) satisfies
Define a difference sequence \(\{\mathbf{Z}_i\}\), where \(\mathbf{Z}_i=\mathbf{X}_i-\mathbf{X}_{i-1}\).
Matrix Azuma–Hoeffding
Tropp. User-friendly tail bounds for sums of random matrices, Found. Comput. Math., Aug 2011.
Applications
1. Matrix Approximation
2. Optimization
3. Matrix Expander Graph
4. Quantum Hypergraph
5. Compressed Sensing
6. Random Process
1. Matrix Approximation
Construct an unbiased random matrix \(\mathbf{R}\) so that \(\mathbb{E} \mathbf{R} = \mathbf{B}\).
Tropp. An introduction to matrix concentration inequalities. Foundations and Trends in Machine Learning 8, 1-2 (2015), 1–230.
1. Matrix Approximation
Tropp. An introduction to matrix concentration inequalities. Foundations and Trends in Machine Learning 8, 1-2 (2015), 1–230.
Use (*) to show a dimension dependant bound on \(\mathbb{P}\{\|\widehat{{\bf R}}_K-\mathbf{B} \|>t\}\).
In addition, \(\mathbb{E} \| \widehat{{\bf R}}_K - {\bf B}\|\leq 2\epsilon\) if
1. Matrix Approximation
\(\max\limits_{k} \mu( {\bf R}_k - {\bf B}) \leq \sqrt{1+2\epsilon \mu( {\bf B} ) }-1\)
However, we show \(\mathbb{E} \mu( \widehat{{\bf R}}_K - {\bf B}) \leq \epsilon \) if
Our result emphasizes the importance of the approximation quality between \(\mathbf{B}\) and \(\mathbf{R}_k\), when the number of copies \(K\) is fixed.
2. Optimization
Chance Constrained Optimization:
where \({\mathcal A}_0({\bf x}) \succeq {\bf 0}\), \({\mathcal A}_k:\mathbb{R}^N \rightarrow \mathbb{S}^{M}\) and \(\xi_k\) are iid r.v.
subject to
2. Optimization
So. Moment inequalities for sums of random matrices and their applications in optimization. Mathematical Programming 130, 1 (2011), 125–151.
2. Optimization
A. So showed that the following relaxation
is a good approximation if \(\{\xi_i\}\) are Gaussian distribution with unit variance or distribution supported on \([-1,1]\).
So. Moment inequalities for sums of random matrices and their applications in optimization. Mathematical Programming 130, 1 (2011), 125–151.
2. Optimization
Application of our tail bound can remove the distributional assumption on \(\{\xi_i\}\), and better \(\gamma\).
3. Matrix Expander Graph
Expander graph is a sparse graph with strong connectivity.
3. Matrix Expander Graph
Random walk on expander graph is as good as independent sampling.
Let \((Y_1,\cdots,Y_K)\) be vertices visited by random walk on \(G\) with spectral gap \(\lambda\).
for \(f: V\to \mathbb{H}^{d\times d}\).
Wigderson and Xiao. A randomness-efficient sampler for matrix-valued functions and applications. FOCS'05, pp. 397-406.
Garg, Lee, Song, and Srivastava. A matrix expander chernoff bound. STOC'18, pp. 1102–1114.
3. Matrix Expander Graph
Garg, Lee, Song, and Srivastava. A matrix expander chernoff bound. STOC'18, pp. 1102–1114.
for some matrix Martingale difference sequence \(\{{\bf Z}_1,\cdots,{\bf Z}_K\}\) and \(\|\cdot\|\) is spectral norm.
3. Matrix Expander Graph
for some matrix Martingale difference sequence \(\{{\bf Z}_1,\cdots,{\bf Z}_K\}\), WHERE \(\|A\|_1=\sum_{i,j}|A_{i,j}|\).
3. Matrix Expander Graph
Garg, Lee, Song, and Srivastava. A matrix expander chernoff bound. STOC'18, pp. 1102–1114.
where \(\phi_{\widetilde{\Omega}}:=\sum_{i=1}^{\widetilde{I}}( [\widetilde{U}_i+1]^{|\widetilde{\Omega}_i|}-1)\) with \(\widetilde{U}_i :=\max_{k\in\widetilde{\Omega}_i} \{u_k \}\).
Proof ?
STEP 1: An Identity
Where
STEP 2: Property of \({{\bf D}}_\mu[\theta; {\bf B}]\)
Thank you!
Dimension Free Tail Inequalities and Applications
By Lawrence Min-Hsiu Hsieh
Dimension Free Tail Inequalities and Applications
- 109