Day 29:

Eckart–Young–Mirsky theorem

Given an \(n\times m\) matrix \(A\), if there exists an orthonormal basis \(\{v_{1},\ldots,v_{n}\}\) for \(\R^{n}\), an orthonormal basis \(\{u_{1},\ldots,u_{m}\}\) for \(\R^{m}\), and nonnegative numbers \(\sigma_{1},\sigma_{2},\ldots,\sigma_{p}\) where \(p=\min\{m,n\}\) such that

\[A = \sum_{i=1}^{p}\sigma_{i}u_{i}v_{i}^{\top},\]

then the above sum is called a singular value decomposition of \(A\). The numbers \(\sigma_{1},\sigma_{2},\ldots,\sigma_{p}\) are called the singular values of \(A\). 

If there is an \(n\times n\) orthogonal matrix \(V\), an \(m\times m\) orthogonal matrix \(U\), and an \(m\times n\) diagonal matrix \(\Sigma\) with nonnegative numbers on the diagonal such that

\[A = U\Sigma V^{\top},\]

then \(U\Sigma V^{\top}\) is called a singular value decomposition of \(A\). The numbers on the diagonal of \(\Sigma\) are called the singular values of \(A\).

 

Let \(\sigma_{i}(A)\) denote the \(i\)th largest singular value of \(A\). (Note that if the singular values of \(A\) are \(5,4,4,3,2,0\), then \(\sigma_{2}(A) = \sigma_{3}(A) = 4\).)

Theorem (Eckart–Young–Mirsky theorem). Given an \(m\times n\) matrix \(A\) with singular value decomposition \[A = \sum_{i=1}^{p}\sigma_{i}u_{i}v_{i}^{\top},\] where \(p=\min\{m,n\},\) fix \(k\leq p\) and define \[A_{k}: = \sum_{i=1}^{k}\sigma_{i}u_{i}v_{i}^{\top}.\] For every \(m\times n\) matrix \(B\) with rank at most \(k\) we have \[\sigma_{i}(A-B)\geq \sigma_{i}(A-A_{k})\quad\text{for all}\quad i\leq p.\]

Proof. Note that \[A - A_{k} = \sum_{i=k+1}^{p}\sigma_{i}u_{i}v_{i}^{\top}.\]

From this, we can see that  \[\sigma(A-A_{k}) = (\sigma_{k+1},\sigma_{k+2},\ldots,\sigma_{p},0,\ldots,0).\]

Of course if \(k=p\) then \(A_{k} = A\).

Proof continued. Let \(B\) be any \(m\times n\) matrix with \(\text{rank}(B)\leq k\). By Rank-nullity, \(N(B)\) is a subspace of \(\R^{n}\) with dimension \(\geq n-k\). On the other hand, the first \(k+1\) right singular vectors of \(A\) span a \(k+1\)-dimensional subspace of \(\R^{n}\). This means that there is a unit norm vector

\[w\in N(B)\cap\text{span}\{v_{i}\}_{i=1}^{k+1}.\]

Since \(w\in\text{span}\{v_{i}\}_{i=1}^{k+1}\) there are scalars \(\alpha_{1},\ldots,\alpha_{k+1}\) such that

\[w=\sum_{i=1}^{k+1}\alpha_{i}v_{i}.\]

\[ = \sum_{i=1}^{k+1}\alpha_{i}^{2}\sigma_{i}^{2}\geq \sum_{i=1}^{k+1}\alpha_{i}^{2}\sigma_{k+1}^{2} = \sigma_{k+1}^2\sum_{i=1}^{k+1}\alpha_{i}^{2} = \sigma_{k+1}^2 = \sigma_{1}(A-A_{k})^2.\]

Now we compute:

\[\sigma_{1}(A-B)^{2}\geq\|(A-B)w\|^{2} = \|Aw\|^{2} = \left\|\sum_{i=1}^{p}\sigma_{i}u_{i}v_{i}^{\top}w\right\|^{2} = \left\|\sum_{i=1}^{k+1}\alpha_{i}\sigma_{i}u_{i}\right\|^{2}\]

Theorem (Eckart–Young–Mirsky theorem). Given an \(m\times n\) matrix \(A\) with singular value decomposition \[A = \sum_{i=1}^{p}\sigma_{i}u_{i}v_{i}^{\top},\] where \(p=\min\{m,n\},\) fix \(k\leq p\) and define \[A_{k}: = \sum_{i=1}^{k}\sigma_{i}u_{i}v_{i}^{\top}.\] For every \(m\times n\) matrix \(B\) with rank at most \(k\) we have \[\sigma_{i}(A-B)\geq \sigma_{i}(A-A_{k})\quad\text{for all}\quad i\leq p.\]

We proved the theorem for \(i=1\), the other cases are similar.

Corollary. Given an \(m\times n\) matrix \(A\) with singular value decomposition 

Proof. We have \(\sigma_{i}(A-B)\geq \sigma_{i}(A-A_{k})\) for all \(i\). Hence,

\[\|A-B\|_{2} = \sigma_{1}(A-B)\geq \sigma_{1}(A-A_{k}) = \|A-A_{k}\|_{2}\]

\[\|A-B\|_{F}^{2} = \sum_{i=1}^{p}\sigma_{i}(A-B)^{2}\geq \sum_{i=1}^{p}\sigma_{i}(A-A_{k})^{2} = \|A-A_{k}\|_{F}^2\]

\[\|A-B\|_{N} = \sum_{i=1}^{p}\sigma_{i}(A-B)\geq \sum_{i=1}^{p}\sigma_{i}(A-A_{k}) = \|A-A_{k}\|_{N}.\ \Box\]

\[A = \sum_{i=1}^{p}\sigma_{i}u_{i}v_{i}^{\top},\] where \(p=\min\{m,n\},\) fix \(k\leq p\) and define \[A_{k}: = \sum_{i=1}^{k}\sigma_{i}u_{i}v_{i}^{\top}.\] For every \(m\times n\) matrix \(B\) with rank at most \(k\) we have \[\|A-B\|\geq \|A-A_{k}\|.\] For each matrix norm, spectral, Frobenius, and nuclear.

Example. Consider the matrix

\[B=\left[\begin{array}{rrrr} 4 & 3 & 2 & 1\\ 4 & -1 & -3 & -2\\ 4 & 2 & 2 & 3\\ 4 & -3 & -1 & -2\end{array}\right]\]

A singular value decomposition of \(B\) (in outer product form) is:

\[B = 8\left(\frac{1}{2}\left[\begin{array}{rrrr} 1 & 0 & 0 & 0\\ 1 & 0 & 0 & 0\\ 1 & 0 & 0 & 0\\ 1 & 0 & 0 & 0\end{array}\right]\right) + 4\sqrt{3}\left(\frac{1}{2\sqrt{3}}\left[\begin{array}{rrrr} 0 & 1 & 1 & 1\\ 0 & -1 & -1 & -1\\ 0 & -1 & -1 & -1\\ 0 & 1 & 1 & 1\end{array}\right]\right) \]

\[+ \sqrt{6}\left(\frac{1}{2\sqrt{6}}\left[\begin{array}{rrrr} 0 & 2 & -1 & -1\\ 0 & 2 & -1 & -1\\ 0 & -2 & 1 & 1\\ 0 & -2 & 1 & 1\end{array}\right]\right) + \sqrt{2}\left(\frac{1}{2\sqrt{2}}\left[\begin{array}{rrrr} 0 & 0 & 1 & -1\\ 0 & 0 & -1 & +1\\ 0 & 0 & -1 & 1\\ 0 & 0 & 1 & -1\end{array}\right]\right) \]

Example. Consider the matrix

\[B=\left[\begin{array}{rrrr} 4 & 3 & 2 & 1\\ 4 & -1 & -3 & -2\\ 4 & 2 & 2 & 3\\ 4 & -3 & -1 & -2\end{array}\right]\]

\[B_{2} = 8\left(\frac{1}{2}\left[\begin{array}{rrrr} 1 & 0 & 0 & 0\\ 1 & 0 & 0 & 0\\ 1 & 0 & 0 & 0\\ 1 & 0 & 0 & 0\end{array}\right]\right) + 4\sqrt{3}\left(\frac{1}{2\sqrt{3}}\left[\begin{array}{rrrr} 0 & 1 & 1 & 1\\ 0 & -1 & -1 & -1\\ 0 & -1 & -1 & -1\\ 0 & 1 & 1 & 1\end{array}\right]\right) \]

\[=\left[\begin{array}{rrrr} 4 & 2 & 2 & 2\\ 4 & -2 & -2 & -2\\ 4 & -2 & -2 & -2\\ 4 & 2 & 2 & 2\end{array}\right] \]

\(\sigma_{1}(B-B_{2}) = \sqrt{6}\), \(\sigma_{2}(B-B_{2}) = \sqrt{2}\), \(\sigma_{3}(B-B_{2}) = \sigma_{4}(B-B_{2}) = 0\)

\[\Rightarrow \|B-B_{2}\|_{2} = \sqrt{6},\quad\|B-B_{2}\|_{F} = \sqrt{8},\quad \|B-B_{2}\|_{N} = \sqrt{6}+\sqrt{2}\]

Consider an \(m\times n\) matrix

\[A = \begin{bmatrix}a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & & \vdots\\ \vdots & & \ddots & \vdots\\ a_{m1} & \cdots & \cdots & a_{mn}\end{bmatrix}\]

Consider the sum

\[\sum_{i=1}^{m}\sum_{j=1}^{n}a_{ij}^{2}.\]

If \(X\) is a square matrix, then \(\text{tr}(X)\) denotes the sum of the diagonal entries in \(X\). Note that

\[\sum_{i=1}^{m}\sum_{j=1}^{n}a_{ij}^{2} = \text{tr}(A^{\top}A)\]

Consider an \(m\times n\) matrix

\[A = \begin{bmatrix}a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & & \vdots\\ \vdots & & \ddots & \vdots\\ a_{m1} & \cdots & \cdots & a_{mn}\end{bmatrix}\]

Consider the sum

\[\sum_{i=1}^{m}\sum_{j=1}^{n}a_{ij}^{2}.\]

If \(X\) is a square matrix, then \(\text{tr}(X)\) denotes the sum of the diagonal entries in \(X\). Note that

\[\sum_{i=1}^{m}\sum_{j=1}^{n}a_{ij}^{2} = \text{tr}(A^{\top}A)\]

Note that \(\text{tr}(X)=\text{tr}(YXY^{-1})\) for any invertible matrix \(Y\) and any matrix \(X\).

Consider an \(m\times n\) matrix

\[A = \begin{bmatrix}a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & & \vdots\\ \vdots & & \ddots & \vdots\\ a_{m1} & \cdots & \cdots & a_{mn}\end{bmatrix}\]

Consider the sum

\[\sum_{i=1}^{m}\sum_{j=1}^{n}a_{ij}^{2}.\]

If \(X\) is a square matrix, then \(\text{tr}(X)\) denotes the sum of the diagonal entries in \(X\). Note that

\[\sum_{i=1}^{m}\sum_{j=1}^{n}a_{ij}^{2} = \text{tr}(A^{\top}A)\]

Note that \(\text{tr}(X)=\text{tr}(YXY^{-1})\) for any invertible matrix \(Y\) and any matrix \(X\).

Consider an \(m\times n\) matrix

\[A = \begin{bmatrix}a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & & \vdots\\ \vdots & & \ddots & \vdots\\ a_{m1} & \cdots & \cdots & a_{mn}\end{bmatrix}\]

Note that \(A^{\top}A = Q\Lambda Q^{\top}\), where \(Q\) is orthogonal, and \(\Lambda\) is a diagonal matrix with \(\sigma_{1}^{2},\ldots,\sigma_{p}^{2}\) on the diagonal.

\[\sum_{i=1}^{m}\sum_{j=1}^{n}a_{ij}^{2} = \text{tr}(A^{\top}A) = \text{tr}(Q\Lambda Q^{\top}) = \text{tr}(\Lambda) = \sum_{i=1}^{p}\sigma_{i}^{2} = \|A\|^{2}_{F}.\]

\[\|A\|_{F} = \sqrt{\sum_{i=1}^{m}\sum_{j=1}^{n}a_{ij}^{2}}\]

If \(w_{1},\ldots,w_{n}\) are the columns of \(A\), then \(\displaystyle{\|A\|_{F} = \sqrt{\sum_{j=1}^{n}\|w_{j}\|^{2}}}\)

Example. Recall

\[B=\left[\begin{array}{rrrr} 4 & 3 & 2 & 1\\ 4 & -1 & -3 & -2\\ 4 & 2 & 2 & 3\\ 4 & -3 & -1 & -2\end{array}\right] \quad\text{and}\quad B_{2}=\left[\begin{array}{rrrr} 4 & 2 & 2 & 2\\ 4 & -2 & -2 & -2\\ 4 & -2 & -2 & -2\\ 4 & 2 & 2 & 2\end{array}\right]\]

\[ \]

By the Eckart-Young-Mirksy theorem, if \(C\) is any rank 2 matrix, then

\[\|B-C\|_{F}\geq \|B-B_{2}\|_{F} = \sqrt{8}\]

  • Let \(b_{i}\) denote the \(i\) column of \(B\).
  • Let \(b_{i}^{(2)}\) denote the \(i\)th column of \(B_{2}\).
  • Let \(c_{1},c_{2},c_{3},c_{4}\) be any vectors from a \(2\) dimensional subspace
  • Let \(C\) be the matrix with columns \(c_{1},c_{2},c_{3},c_{4}\).

\[\sqrt{\sum_{i=1}^{4}\|b_{i} - c_{i}\|^{2}} = \|B-C\|_{F}\geq \|B-B_{2}\|_{F} = \sqrt{8} = \sqrt{\sum_{i=1}^{4}\|b_{i} - b_{i}^{(2)}\|^{2}}\]

Linear Algebra Day 29

By John Jasper

Linear Algebra Day 29

  • 405