Day 27:

Proof of Singular Value Decomposition

Theorem (Singular Value Decomposition - part 1). If \(B\) is an \(m\times n\) matrix and \(p=\min\{m,n\}\), then there are orthonormal bases \(\{u_{1},\ldots,u_{m}\}\) for \(\R^{m}\) and \(\{v_{1},\ldots,v_{n}\}\) for \(\R^{n}\), and nonnegative scalars \(\sigma_{1},\ldots,\sigma_{p}\) such that

\[Bv_{i} = \begin{cases} \sigma_{i}u_{i} & i\leq p\\ 0, & i>p.\end{cases}\]

 

Notes:

  • The numbers \(\sigma_{1},\sigma_{2},\ldots,\sigma_{p}\) are called the singular values of \(B\).
  • The sequence of singular values is denoted with the symbol \(\sigma(B)\).
  • The vectors \(v_{1},v_{2},\ldots,v_{n}\) are called the right singular vectors of \(B\)
  • The vectors \(u_{1},u_{2},\ldots,u_{m}\) are called the left singular vectors of \(B\).

Proof. Let \(v_{1},\ldots,v_{n}\) be an orthonormal basis of eigenvectors of \(B^{\top}B\) with associated eigenvalues \(\lambda_{1}\geq \lambda_{2}\geq\ldots\geq \lambda_{n}\geq 0.\)

Let \(r\) be the rank of \(B,\) so that \(\lambda_{r}>0\) and \(\lambda_{r+1}=0\) (if \(r<n\)).

For each \(i\in\{1,\ldots,r\}\) define \[u_{i} = \frac{Bv_{i}}{\sqrt{\lambda_{i}}}.\] Note that \[u_{i}\cdot u_{j} = \frac{1}{\sqrt{\lambda_{i}\lambda_{j}}}(Bv_{i})^{\top}(Bv_{j}) = \frac{1}{\sqrt{\lambda_{i}\lambda_{j}}}v_{i}^{\top}B^{\top}Bv_{j} = \frac{1}{\sqrt{\lambda_{i}\lambda_{j}}}v_{i}^{\top}\lambda_{j}v_{j}\]

Proof continued. \[u_{i}\cdot u_{j} = \frac{1}{\sqrt{\lambda_{i}\lambda_{j}}}v_{i}^{\top}\lambda_{j}v_{j} = \frac{\lambda_{j}}{\sqrt{\lambda_{i}\lambda_{j}}} v_{i}\cdot v_{j} = \begin{cases} 1 & i=j,\\ 0 & i\neq j\end{cases}\]

From this we see that \(\{u_{1},\ldots,u_{r}\}\) is an orthonormal set. Next, note that \[BB^{\top} u_{i} = BB^{\top}\frac{Bv_{i}}{\sqrt{\lambda_{i}}} = \frac{1}{\sqrt{\lambda_{i}}}BB^{\top}Bv_{i} = \frac{B\lambda_{i}v_{i}}{\sqrt{\lambda_{i}}} = \lambda_{i}u_{i}.\]

 Since \(r=\operatorname{rank}(B) = \text{rank}(BB^{\top})\) we see that \(\{u_{1},\ldots,u_{r}\}\)is an orthonormal basis for \(C(BB^{\top})\). We can complete this to an orthonormal basis \(\{u_{1},\ldots,u_{m}\}\). By the definition of \(u_{i}\) for \(i=1,\ldots,r\) we have

\[Bv_{i} = \sqrt{\lambda_{i}}u_{i}\]

Setting \(\sigma_{i} = \sqrt{\lambda_{i}}\) this completes the proof. \(\Box\)

Example. Consider the matrix

\[B = \begin{bmatrix} 7 & 3 & 7 & 3\\ 3 & 7 & 3 & 7\end{bmatrix}\]

We compute

\[B^{\top}B = \begin{bmatrix} 58 & 42 & 58 & 42\\ 42 & 58 & 42 & 58\\ 58 & 42 & 58 & 42\\ 42 & 58 & 42 & 58\end{bmatrix}\]

Since this matrix is positive semidefinite (not definite) we have the spectral decomposition:

\[B^{\top}B = \frac{1}{2}\begin{bmatrix} 1 & 1 & 1 & 1\\ 1 & -1 & 1 & -1\\ 1 & 1 & -1 & -1\\ 1 & -1 & -1 & 1\end{bmatrix}\begin{bmatrix} 200 & 0 & 0 & 0\\ 0 & 32 & 0 & 0 \\ 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0\end{bmatrix}\frac{1}{2}\begin{bmatrix} 1 & 1 & 1 & 1\\ 1 & -1 & 1 & -1\\ 1 & 1 & -1 & -1\\ 1 & -1 & -1 & 1\end{bmatrix}\]

Note that the spectral decompositon is not unique.

Example. Consider the matrix

\[B = \begin{bmatrix} 7 & 3 & 7 & 3\\ 3 & 7 & 3 & 7\end{bmatrix}\]

We compute

\[B^{\top}B = \begin{bmatrix} 58 & 42 & 58 & 42\\ 42 & 58 & 42 & 58\\ 58 & 42 & 58 & 42\\ 42 & 58 & 42 & 58\end{bmatrix}\]

Since this matrix is positive semidefinite (not definite) we have the spectral decomposition:

\(B^{\top}B =\)\[ \frac{1}{2}\begin{bmatrix} 1 & 1 & \sqrt{2} & 0\\ 1 & -1 & 0 & \sqrt{2}\\ 1 & 1 & -\sqrt{2} & 0\\ 1 & -1 & 0 & -\sqrt{2}\end{bmatrix}\begin{bmatrix} 200 & 0 & 0 & 0\\ 0 & 32 & 0 & 0 \\ 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0\end{bmatrix}\frac{1}{2}\begin{bmatrix} 1 & 1 & 1 & 1\\ 1 & -1 & 1 & -1\\ \sqrt{2}  & 0 & -\sqrt{2}  & 0\\ 0 & \sqrt{2}  & 0 & -\sqrt{2} \end{bmatrix}\]

Note that the spectral decompositon is not unique.

Example continued. Let

\[V = \frac{1}{2}\begin{bmatrix} 1 & 1 & 1 & 1\\ 1 & -1 & 1 & -1\\ 1 & 1 & -1 & -1\\ 1 & -1 & -1 & 1\end{bmatrix}\]

and let \(v_{i}\) denote the \(i\)th column of \(V\). Since the columns of \(V\) are a basis of eigenvectors of \(B^{\top}B\), we see that these are the right singular vectors of \(B\).

From the proof, we can see that the singular values of \(B\) are

\[\sigma_{1} = \sqrt{200} = 10\sqrt{2}\quad\text{and}\quad\sigma_{2} = \sqrt{32} = 4\sqrt{2},\]

and the left singular vectors are

\[u_{1} = \frac{1}{10\sqrt{2}}Bv_{1} = \frac{1}{\sqrt{2}}\begin{bmatrix}1\\ 1\end{bmatrix}\quad\text{and}\quad u_{2} = \frac{1}{4\sqrt{2}}Bv_{2} = \frac{1}{\sqrt{2}}\begin{bmatrix}1\\ -1\end{bmatrix}\]

Theorem (Singular Value Decomposition - Outer product form). If \(B\) is an \(m\times n\) matrix and \(p=\min\{m,n\}\), then there are orthonormal bases \(\{u_{1},\ldots,u_{m}\}\) for \(\R^{m}\) and \(\{v_{1},\ldots,v_{n}\}\) for \(\R^{n}\), and nonnegative scalars \(\sigma_{1},\ldots,\sigma_{p}\) such that

\[B = \sum_{i=1}^{p}\sigma_{i}u_{i}v_{i}^{\top}.\]

Proof. From the previous theorem we have orthonormal bases \(\{u_{1},\ldots,u_{m}\}\) for \(\mathbb{R}^{m}\) and \(\{v_{1},\ldots,v_{n}\}\) for \(\mathbb{R}^{n}\) and nonnegative scalars \(\sigma_{1},\ldots,\sigma_{p}\) such that \(Bv_{i}=\sigma_{i}u_{i}\) for all \(i\leq p\) and \(Bv_{i} = 0\) for \(i>p\).

Define the matrix

\[C: = \sum_{i=1}^{p}\sigma_{i}u_{i}v_{i}^{\top}\]

Note that for \(i_{0}\leq p\), since \(v_{1},\ldots,v_{n}\) is orthonormal

\[Cv_{i_{0}} = \sum_{i=1}^{p}\sigma_{i}u_{i}v_{i}^{\top}v_{i_{0}} = \sigma_{i_{0}}u_{i_{0}} = Bv_{i_{0}}\]

Proof continued. If \(i_{0}>p\) then \(v_{i_{0}}\) is orthogonal to \(v_{i}\) for all \(i\leq p\) and hence

\[Cv_{i_{0}} = 0 = Bv_{i_{0}}.\]

Thus, we see that \(Cv_{i} = Bv_{i}\) for all \(i\in\{1,2,\ldots,n\}\).

Next, suppose that there is a vector \(v\in\mathbb{R}^{n}\) such that \(Cv\neq Bv\). This implies that \((C-B)v \neq 0\). Since \(v_{1},v_{2},\ldots,v_{n}\) is a basis for \(\mathbb{R}^{n}\), there are scalars \(\alpha_{1},\alpha_{2},\ldots,\alpha_{n}\) such that

\[v = \alpha_{1}v_{1} + \alpha_{2}v_{2} + \cdots + \alpha_{n}v_{n}.\]

Finally, we multiply by \(C-B\) on both sides and we see that

\[0\neq (C-B)v = \alpha_{1}(C-B)v_{1} + \alpha_{2}(C-B)v_{2} + \cdots + \alpha_{n}(C-B)v_{n}\]

\[= \alpha_{1}0 + \alpha_{2}0 + \cdots + \alpha_{n}0 = 0\]

This contradiction shows that \((C-B)v=0\) for all \(v\in\mathbb{R}^{n}\), and hence \(C=B\). \(\Box\)

Example continued. Recall the matrix

\[B = \begin{bmatrix} 7 & 3 & 7 & 3\\ 3 & 7 & 3 & 7\end{bmatrix}\]

 

Hence, the outer product form of the singular value decomposition of \(B\) is \[B = \sigma_{1}u_{1}v_{1}^{\top} + \sigma_{2}u_{2}v_{2}^{\top}=\begin{bmatrix} 5 & 5 & 5 & 5\\ 5 & 5 & 5 & 5\end{bmatrix} + \begin{bmatrix} 2 & -2 & 2 & -2\\ -2 & 2 & -2 & 2\end{bmatrix}\]

We already saw that the right singular vectors of \(B\) are \[v_{1} = \frac{1}{2}\left[\begin{array}{r} 1\\ 1\\ 1\\ 1\end{array}\right],\ v_{2} = \frac{1}{2}\left[\begin{array}{r} 1\\ -1\\ 1\\ -1\end{array}\right],\ v_{3} = \frac{1}{2}\left[\begin{array}{r} 1\\ 1\\ -1\\ -1\end{array}\right],\ v_{4} = \frac{1}{2}\left[\begin{array}{r} 1\\ -1\\ -1\\ 1\end{array}\right],\] the singular values of \(B\) are \[\sigma_{1} = \sqrt{200} = 10\sqrt{2}\quad\text{and}\quad\sigma_{2} = \sqrt{32} = 4\sqrt{2},\] and the left singular vectors are \[u_{1} = \frac{1}{10\sqrt{2}}Bv_{1} = \frac{1}{\sqrt{2}}\begin{bmatrix}1\\ 1\end{bmatrix}\quad\text{and}\quad u_{2} = \frac{1}{4\sqrt{2}}Bv_{2} = \frac{1}{\sqrt{2}}\begin{bmatrix}1\\ -1\end{bmatrix}\]

Theorem (Singular Value Decomposition - Matrix Form). If \(B\) is an \(m\times n\) matrix, then there is an \(m\times m\) orthogonal matrix \(R\), an \(n\times n\) orthogonal matrix \(Q\), and an \(m\times n\) diagonal matrix \(\Sigma\) such that

\[B = R\Sigma Q^{\top}.\]

Proof. By the outer product form of the singular value decomposition there are orthonormal bases \(\{u_{1},\ldots,u_{m}\}\) for \(\R^{m}\) and \(\{v_{1},\ldots,v_{n}\}\) for \(\R^{n}\), and nonnegative scalars \(\sigma_{1},\ldots,\sigma_{p}\) such that

\[B = \sum_{i=1}^{p}\sigma_{i}u_{i}v_{i}^{\top}.\]

Let \(R\) be the matrix with columns \(u_{1},\ldots,u_{m}\), and let \(Q\) be the matrix with columns \(v_{1},\ldots,v_{n}\). Since the columns of \(R\) and \(Q\) are orthonormal bases, these matrices are orthogonal.

Let \(\Sigma\) be \(m\times n\) diagonal matrix with entries \(\sigma_{1},\sigma_{2},\ldots,\sigma_{p}\) on the diagonal, and the rest of the diagonal entries equal to zero.

Proof continued. 

\(R\Sigma Q^{\top}\)\[=\begin{bmatrix} \vert & \vert & & \vert\\ u_{1} & u_{2} & \cdots & u_{m}\\ \vert & \vert & & \vert\end{bmatrix}\begin{bmatrix} \sigma_{1} & & & & & &\\ & \sigma_{2} & & & & &\\ & & \ddots & & & & \\ & & & \sigma_{p} & & & \\ & & & & 0 & &\\ & & & & & \ddots & \\ & & & & & & 0\end{bmatrix}\begin{bmatrix} - & v_{1}^{\top} & - \\ - & v_{2}^{\top} & - \\ & \vdots & \\ - & v_{n}^{\top} & -\end{bmatrix}\]

\[=\begin{bmatrix} \vert & \vert & & \vert\\ u_{1} & u_{2} & \cdots & u_{m}\\ \vert & \vert & & \vert\end{bmatrix}\begin{bmatrix} - & \sigma_{1} v_{1}^{\top} & - \\ - & \sigma_{2}v_{2}^{\top} & - \\ & \vdots & \\ - & \sigma_{p}v_{p}^{\top} & - \\ - & 0 & -\\ & \vdots & \\ - & 0 & -\end{bmatrix} = \sum_{i=1}^{p}\sigma_{i}u_{i}v_{i}^{\top} = B.\]

 

\(\Box\)

Continuing the last Example. We had

 

  

\[B = 10\sqrt{2}u_{1}v_{1}^{\top} + 4\sqrt{2}u_{2}v_{2}^{\top}=\begin{bmatrix} 5 & 5 & 5 & 5\\ 5 & 5 & 5 & 5\end{bmatrix} + \left[\begin{array}{rrrr} 2 & -2 & 2 & -2\\ -2 & 2 & -2 & 2\end{array}\right]\]

\(v_{1} = \frac{1}{2}\begin{bmatrix} 1\\ 1\\ 1\\ 1\end{bmatrix},\ v_{2} = \frac{1}{2}\begin{bmatrix} \phantom{-}1\\ -1\\ \phantom{-}1\\ -1\end{bmatrix},\ v_{3} = \frac{1}{2}\begin{bmatrix} \phantom{-}1\\ \phantom{-}1\\ -1\\ -1\end{bmatrix},\ v_{4} = \frac{1}{2}\begin{bmatrix} -1\\ \phantom{-}1\\ \phantom{-}1\\ -1\end{bmatrix}\)

and

\(u_{1} = \frac{1}{\sqrt{2}}\begin{bmatrix} 1\\ 1\end{bmatrix},\ u_{2} = \frac{1}{\sqrt{2}}\begin{bmatrix} \phantom{-}1\\ -1\end{bmatrix}\)

\(B = \begin{bmatrix} 7 & 3 & 7 & 3\\ 3 & 7 & 3 & 7\end{bmatrix}\)\[ = \left(\frac{1}{\sqrt{2}}\begin{bmatrix} 1 & \phantom{-}1\\ 1 & -1\end{bmatrix}\right)\begin{bmatrix} 10\sqrt{2} & 0 & 0 & 0\\ 0 & 4\sqrt{2} & 0 & 0\end{bmatrix}\left(\frac{1}{2}\left[\begin{array}{rrrr} 1 & 1 & 1 & 1\\ 1 & -1 & 1 & -1\\ 1 & 1 & -1 & -1\\ 1 & -1 & -1 & 1\end{array}\right]\right)\]

The matrix form or the singular value decomposition of \(B\) is

Example 2. Let

\[B = \begin{bmatrix} 2 & -1\\ 2 & 1\end{bmatrix}.\]

\[B^{\top}B = \begin{bmatrix} 8 & 0\\ 0 & 2\end{bmatrix} = \begin{bmatrix} 1 & 0\\ 0 & 1\end{bmatrix}\begin{bmatrix} 8 & 0\\ 0 & 2\end{bmatrix}\begin{bmatrix} 1 & 0\\ 0 & 1\end{bmatrix}\]

 

\[v_{1} = \begin{bmatrix}1\\ 0\end{bmatrix}\quad\text{and}\quad v_{2} = \begin{bmatrix} 0\\ 1\end{bmatrix}\]

 

\[u_{1} = \frac{1}{\sqrt{2}}\begin{bmatrix}1\\ 1\end{bmatrix}\quad\text{and}\quad u_{2} = \frac{1}{\sqrt{2}}\begin{bmatrix} -1\\ 1\end{bmatrix}\]

\[B = 2\sqrt{2}\left(\frac{1}{\sqrt{2}}\begin{bmatrix} 1 & 0\\ 1 & 0\end{bmatrix}\right) + \sqrt{2}\left(\frac{1}{\sqrt{2}}\begin{bmatrix} 0 & -1\\ 0 & 1\end{bmatrix}\right)\]

\[B = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 & -1\\ 1 & 1\end{bmatrix}\begin{bmatrix} 2\sqrt{2} & 0\\ 0 & \sqrt{2}\end{bmatrix}\begin{bmatrix}1 & 0\\ 0 & 1\end{bmatrix}\]

Matrix form:

Outer product form:

Example 3. Let

\[B = \begin{bmatrix} 1 & 1\\ 1 & -1\\ 1 & 1\end{bmatrix},\]

then

\[B^{\top}B = \begin{bmatrix} 3 & 1\\ 1 & 3\end{bmatrix} = \left(\frac{1}{\sqrt{2}}\begin{bmatrix} 1 & 1\\ 1 & -1\end{bmatrix}\right)\begin{bmatrix}4 & 0\\ 0 & 2\end{bmatrix} \left(\frac{1}{\sqrt{2}}\begin{bmatrix} 1 & 1\\ 1 & -1\end{bmatrix}\right)\]

Note that 

\[v_{1} = \frac{1}{\sqrt{2}}\begin{bmatrix} 1\\ 1\end{bmatrix}\quad\text{and}\quad v_{2} = \frac{1}{\sqrt{2}}\begin{bmatrix} 1\\ -1\end{bmatrix}\]

are a basis of right singular vectors of \(B\). The singular values of \(B\) are \(2\) and \(\sqrt{2}\). Two of the left singular vectors are

\[u_{1} = \frac{Bv_{1}}{2} = \frac{1}{\sqrt{2}}\begin{bmatrix} 1\\ 0\\ 1\end{bmatrix}\quad\text{and}\quad u_{2} = \frac{Bv_{2}}{\sqrt{2}} = \begin{bmatrix} 0\\ 1\\ 0\end{bmatrix}.\]

However, \(\{u_{1},u_{2}\}\) is not a basis of left singular vectors. We must complete it to a basis!

Example 3 continued.  We need an orthonormal basis for \(\operatorname{span}\{u_{1},u_{2}\}^{\bot}\). If we make the matrix

\[X = \begin{bmatrix} u_{1}^{\top}\\ u_{2}^{\top}\end{bmatrix}\]

Then \(\operatorname{span}\{u_{1},u_{2}\}^{\bot} = N(X).\)

We find a basis for \(N(X)\). Then, if necessary, use Gram-Schmidt to find an orthonormal basis for \(N(X)\). In this case we find that

\[u_{3} = \frac{1}{\sqrt{2}}\begin{bmatrix} -1\\ 0\\ 1\end{bmatrix}\]

is an orthonormal basis for \(N(X) = \operatorname{span}\{u_{1},u_{2}\}^{\bot}\). Thus \(\{u_{1},u_{2},u_{3}\}\) is an orthonormal basis of left singular vectors of \(B\). Thus, we have the singular value decompositions:

\[B = 2u_{1}v_{1}^{\top} + \sqrt{2}u_{2}v_{2}^{\top} \]

Example 3 continued. 

 

 

\[B = \left(\frac{1}{\sqrt{2}}\begin{bmatrix} 1 & 0 & -1\\ 0 & \sqrt{2} & 0\\ 1 & 0 & 1\end{bmatrix}\right)\begin{bmatrix} 2 & 0\\ 0 & \sqrt{2}\\ 0 & 0\end{bmatrix}\left(\frac{1}{\sqrt{2}}\begin{bmatrix} 1 & 1\\ 1 & -1\end{bmatrix}\right)\]

Example continued. Let \(\displaystyle{B = \begin{bmatrix} 2 & -1\\ 2 & 1\end{bmatrix}.}\)

\(B(\text{blue vector}) = \text{red vector}\)

\(\text{blue vector}\)

\(\text{red vector}\)

The blue vectors go through all vectors of length \(1\).

Copy of Linear Algebra Day 27

By John Jasper

Copy of Linear Algebra Day 27

  • 483