# Day 29:

Eckart–Young–Mirsky theorem

Given an $$n\times m$$ matrix $$A$$, if there exists an orthonormal basis $$\{v_{1},\ldots,v_{n}\}$$ for $$\R^{n}$$, an orthonormal basis $$\{u_{1},\ldots,u_{m}\}$$ for $$\R^{m}$$, and nonnegative numbers $$\sigma_{1},\sigma_{2},\ldots,\sigma_{p}$$ where $$p=\min\{m,n\}$$ such that

$A = \sum_{i=1}^{p}\sigma_{i}u_{i}v_{i}^{\top},$

then the above sum is called a singular value decomposition of $$A$$. The numbers $$\sigma_{1},\sigma_{2},\ldots,\sigma_{p}$$ are called the singular values of $$A$$.

If there is an $$n\times n$$ orthogonal matrix $$V$$, an $$m\times m$$ orthogonal matrix $$U$$, and an $$m\times n$$ diagonal matrix $$\Sigma$$ with nonnegative numbers on the diagonal such that

$A = U\Sigma V^{\top},$

then $$U\Sigma V^{\top}$$ is called a singular value decomposition of $$A$$. The numbers on the diagonal of $$\Sigma$$ are called the singular values of $$A$$.

Let $$\sigma_{i}(A)$$ denote the $$i$$th largest singular value of $$A$$. (Note that if the singular values of $$A$$ are $$5,4,4,3,2,0$$, then $$\sigma_{2}(A) = \sigma_{3}(A) = 4$$.)

Theorem (Eckart–Young–Mirsky theorem). Given an $$m\times n$$ matrix $$A$$ with singular value decomposition $A = \sum_{i=1}^{p}\sigma_{i}u_{i}v_{i}^{\top},$ where $$p=\min\{m,n\},$$ fix $$k\leq p$$ and define $A_{k}: = \sum_{i=1}^{k}\sigma_{i}u_{i}v_{i}^{\top}.$ For every $$m\times n$$ matrix $$B$$ with rank at most $$k$$ we have $\sigma_{i}(A-B)\geq \sigma_{i}(A-A_{k})\quad\text{for all}\quad i\leq p.$

Proof. Note that $A - A_{k} = \sum_{i=k+1}^{p}\sigma_{i}u_{i}v_{i}^{\top}.$

From this, we can see that  $\sigma(A-A_{k}) = (\sigma_{k+1},\sigma_{k+2},\ldots,\sigma_{p},0,\ldots,0).$

Of course if $$k=p$$ then $$A_{k} = A$$.

Proof continued. Let $$B$$ be any $$m\times n$$ matrix with $$\text{rank}(B)\leq k$$. By Rank-nullity, $$N(B)$$ is a subspace of $$\R^{n}$$ with dimension $$\geq n-k$$. On the other hand, the first $$k+1$$ right singular vectors of $$A$$ span a $$k+1$$-dimensional subspace of $$\R^{n}$$. This means that there is a unit norm vector

$w\in N(B)\cap\text{span}\{v_{i}\}_{i=1}^{k+1}.$

Since $$w\in\text{span}\{v_{i}\}_{i=1}^{k+1}$$ there are scalars $$\alpha_{1},\ldots,\alpha_{k+1}$$ such that

$w=\sum_{i=1}^{k+1}\alpha_{i}v_{i}.$

$= \sum_{i=1}^{k+1}\alpha_{i}^{2}\sigma_{i}^{2}\geq \sum_{i=1}^{k+1}\alpha_{i}^{2}\sigma_{k+1}^{2} = \sigma_{k+1}^2\sum_{i=1}^{k+1}\alpha_{i}^{2} = \sigma_{k+1}^2 = \sigma_{1}(A-A_{k})^2.$

Now we compute:

$\sigma_{1}(A-B)^{2}\geq\|(A-B)w\|^{2} = \|Aw\|^{2} = \left\|\sum_{i=1}^{p}\sigma_{i}u_{i}v_{i}^{\top}w\right\|^{2} = \left\|\sum_{i=1}^{k+1}\alpha_{i}\sigma_{i}u_{i}\right\|^{2}$

Theorem (Eckart–Young–Mirsky theorem). Given an $$m\times n$$ matrix $$A$$ with singular value decomposition $A = \sum_{i=1}^{p}\sigma_{i}u_{i}v_{i}^{\top},$ where $$p=\min\{m,n\},$$ fix $$k\leq p$$ and define $A_{k}: = \sum_{i=1}^{k}\sigma_{i}u_{i}v_{i}^{\top}.$ For every $$m\times n$$ matrix $$B$$ with rank at most $$k$$ we have $\sigma_{i}(A-B)\geq \sigma_{i}(A-A_{k})\quad\text{for all}\quad i\leq p.$

We proved the theorem for $$i=1$$, the other cases are similar.

Corollary. Given an $$m\times n$$ matrix $$A$$ with singular value decomposition

Proof. We have $$\sigma_{i}(A-B)\geq \sigma_{i}(A-A_{k})$$ for all $$i$$. Hence,

$\|A-B\|_{2} = \sigma_{1}(A-B)\geq \sigma_{1}(A-A_{k}) = \|A-A_{k}\|_{2}$

$\|A-B\|_{F}^{2} = \sum_{i=1}^{p}\sigma_{i}(A-B)^{2}\geq \sum_{i=1}^{p}\sigma_{i}(A-A_{k})^{2} = \|A-A_{k}\|_{F}^2$

$\|A-B\|_{N} = \sum_{i=1}^{p}\sigma_{i}(A-B)\geq \sum_{i=1}^{p}\sigma_{i}(A-A_{k}) = \|A-A_{k}\|_{N}.\ \Box$

$A = \sum_{i=1}^{p}\sigma_{i}u_{i}v_{i}^{\top},$ where $$p=\min\{m,n\},$$ fix $$k\leq p$$ and define $A_{k}: = \sum_{i=1}^{k}\sigma_{i}u_{i}v_{i}^{\top}.$ For every $$m\times n$$ matrix $$B$$ with rank at most $$k$$ we have $\|A-B\|\geq \|A-A_{k}\|.$ For each matrix norm, spectral, Frobenius, and nuclear.

Example. Consider the matrix

$B=\left[\begin{array}{rrrr} 4 & 3 & 2 & 1\\ 4 & -1 & -3 & -2\\ 4 & 2 & 2 & 3\\ 4 & -3 & -1 & -2\end{array}\right]$

A singular value decomposition of $$B$$ (in outer product form) is:

$B = 8\left(\frac{1}{2}\left[\begin{array}{rrrr} 1 & 0 & 0 & 0\\ 1 & 0 & 0 & 0\\ 1 & 0 & 0 & 0\\ 1 & 0 & 0 & 0\end{array}\right]\right) + 4\sqrt{3}\left(\frac{1}{2\sqrt{3}}\left[\begin{array}{rrrr} 0 & 1 & 1 & 1\\ 0 & -1 & -1 & -1\\ 0 & -1 & -1 & -1\\ 0 & 1 & 1 & 1\end{array}\right]\right)$

$+ \sqrt{6}\left(\frac{1}{2\sqrt{6}}\left[\begin{array}{rrrr} 0 & 2 & -1 & -1\\ 0 & 2 & -1 & -1\\ 0 & -2 & 1 & 1\\ 0 & -2 & 1 & 1\end{array}\right]\right) + \sqrt{2}\left(\frac{1}{2\sqrt{2}}\left[\begin{array}{rrrr} 0 & 0 & 1 & -1\\ 0 & 0 & -1 & +1\\ 0 & 0 & -1 & 1\\ 0 & 0 & 1 & -1\end{array}\right]\right)$

Example. Consider the matrix

$B=\left[\begin{array}{rrrr} 4 & 3 & 2 & 1\\ 4 & -1 & -3 & -2\\ 4 & 2 & 2 & 3\\ 4 & -3 & -1 & -2\end{array}\right]$

$B_{2} = 8\left(\frac{1}{2}\left[\begin{array}{rrrr} 1 & 0 & 0 & 0\\ 1 & 0 & 0 & 0\\ 1 & 0 & 0 & 0\\ 1 & 0 & 0 & 0\end{array}\right]\right) + 4\sqrt{3}\left(\frac{1}{2\sqrt{3}}\left[\begin{array}{rrrr} 0 & 1 & 1 & 1\\ 0 & -1 & -1 & -1\\ 0 & -1 & -1 & -1\\ 0 & 1 & 1 & 1\end{array}\right]\right)$

$=\left[\begin{array}{rrrr} 4 & 2 & 2 & 2\\ 4 & -2 & -2 & -2\\ 4 & -2 & -2 & -2\\ 4 & 2 & 2 & 2\end{array}\right]$

$$\sigma_{1}(B-B_{2}) = \sqrt{6}$$, $$\sigma_{2}(B-B_{2}) = \sqrt{2}$$, $$\sigma_{3}(B-B_{2}) = \sigma_{4}(B-B_{2}) = 0$$

$\Rightarrow \|B-B_{2}\|_{2} = \sqrt{6},\quad\|B-B_{2}\|_{F} = \sqrt{8},\quad \|B-B_{2}\|_{N} = \sqrt{6}+\sqrt{2}$

Consider an $$m\times n$$ matrix

$A = \begin{bmatrix}a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & & \vdots\\ \vdots & & \ddots & \vdots\\ a_{m1} & \cdots & \cdots & a_{mn}\end{bmatrix}$

Consider the sum

$\sum_{i=1}^{m}\sum_{j=1}^{n}a_{ij}^{2}.$

If $$X$$ is a square matrix, then $$\text{tr}(X)$$ denotes the sum of the diagonal entries in $$X$$. Note that

$\sum_{i=1}^{m}\sum_{j=1}^{n}a_{ij}^{2} = \text{tr}(A^{\top}A)$

Consider an $$m\times n$$ matrix

$A = \begin{bmatrix}a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & & \vdots\\ \vdots & & \ddots & \vdots\\ a_{m1} & \cdots & \cdots & a_{mn}\end{bmatrix}$

Consider the sum

$\sum_{i=1}^{m}\sum_{j=1}^{n}a_{ij}^{2}.$

If $$X$$ is a square matrix, then $$\text{tr}(X)$$ denotes the sum of the diagonal entries in $$X$$. Note that

$\sum_{i=1}^{m}\sum_{j=1}^{n}a_{ij}^{2} = \text{tr}(A^{\top}A)$

Note that $$\text{tr}(X)=\text{tr}(YXY^{-1})$$ for any invertible matrix $$Y$$ and any matrix $$X$$.

Consider an $$m\times n$$ matrix

$A = \begin{bmatrix}a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & & \vdots\\ \vdots & & \ddots & \vdots\\ a_{m1} & \cdots & \cdots & a_{mn}\end{bmatrix}$

Consider the sum

$\sum_{i=1}^{m}\sum_{j=1}^{n}a_{ij}^{2}.$

If $$X$$ is a square matrix, then $$\text{tr}(X)$$ denotes the sum of the diagonal entries in $$X$$. Note that

$\sum_{i=1}^{m}\sum_{j=1}^{n}a_{ij}^{2} = \text{tr}(A^{\top}A)$

Note that $$\text{tr}(X)=\text{tr}(YXY^{-1})$$ for any invertible matrix $$Y$$ and any matrix $$X$$.

Consider an $$m\times n$$ matrix

$A = \begin{bmatrix}a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & & \vdots\\ \vdots & & \ddots & \vdots\\ a_{m1} & \cdots & \cdots & a_{mn}\end{bmatrix}$

Note that $$A^{\top}A = Q\Lambda Q^{\top}$$, where $$Q$$ is orthogonal, and $$\Lambda$$ is a diagonal matrix with $$\sigma_{1}^{2},\ldots,\sigma_{p}^{2}$$ on the diagonal.

$\sum_{i=1}^{m}\sum_{j=1}^{n}a_{ij}^{2} = \text{tr}(A^{\top}A) = \text{tr}(Q\Lambda Q^{\top}) = \text{tr}(\Lambda) = \sum_{i=1}^{p}\sigma_{i}^{2} = \|A\|^{2}_{F}.$

$\|A\|_{F} = \sqrt{\sum_{i=1}^{m}\sum_{j=1}^{n}a_{ij}^{2}}$

If $$w_{1},\ldots,w_{n}$$ are the columns of $$A$$, then $$\displaystyle{\|A\|_{F} = \sqrt{\sum_{j=1}^{n}\|w_{j}\|^{2}}}$$

Example. Recall

$B=\left[\begin{array}{rrrr} 4 & 3 & 2 & 1\\ 4 & -1 & -3 & -2\\ 4 & 2 & 2 & 3\\ 4 & -3 & -1 & -2\end{array}\right] \quad\text{and}\quad B_{2}=\left[\begin{array}{rrrr} 4 & 2 & 2 & 2\\ 4 & -2 & -2 & -2\\ 4 & -2 & -2 & -2\\ 4 & 2 & 2 & 2\end{array}\right]$



By the Eckart-Young-Mirksy theorem, if $$C$$ is any rank 2 matrix, then

$\|B-C\|_{F}\geq \|B-B_{2}\|_{F} = \sqrt{8}$

• Let $$b_{i}$$ denote the $$i$$ column of $$B$$.
• Let $$b_{i}^{(2)}$$ denote the $$i$$th column of $$B_{2}$$.
• Let $$c_{1},c_{2},c_{3},c_{4}$$ be any vectors from a $$2$$ dimensional subspace
• Let $$C$$ be the matrix with columns $$c_{1},c_{2},c_{3},c_{4}$$.

$\sqrt{\sum_{i=1}^{4}\|b_{i} - c_{i}\|^{2}} = \|B-C\|_{F}\geq \|B-B_{2}\|_{F} = \sqrt{8} = \sqrt{\sum_{i=1}^{4}\|b_{i} - b_{i}^{(2)}\|^{2}}$

By John Jasper

• 421