# Day 30:

PCA and the Pseudoinverse

Theorem. Given an $$m\times n$$ matrix $$A$$ with singular value decomposition

$A = \sum_{i=1}^{p}\sigma_{i}u_{i}v_{i}^{\top},$ where $$p=\min\{m,n\},$$ fix $$k\leq p$$ and define $A_{k}: = \sum_{i=1}^{k}\sigma_{i}u_{i}v_{i}^{\top}.$ For every $$m\times n$$ matrix $$B$$ with rank at most $$k$$ we have $\|A-B\|_{F}\geq \|A-A_{k}\|_{F}.$

Theorem. If $$w_{1},\ldots,w_{n}$$ are the columns of $$A$$, then $\|A\|_{F} = \sqrt{\sum_{j=1}^{n}\|w_{j}\|^{2}}$

Suppose $$A\in\mathbb{R}^{m\times n}$$ has singular value decomposition $A = \sum_{i=1}^{p}\sigma_{i}u_{i}v_{i}^{\top},$ where $$p=\min\{m,n\}.$$ In matrix form:

$A = \left[\begin{array}{cccc} \vert & \vert & & \vert\\ u_{1} & u_{2} & \cdots & u_{m}\\ \vert & \vert & & \vert\end{array}\right] \left[\begin{array}{cccccc} \sigma_{1} & 0 & \cdots & 0 & 0 & \cdots\\ 0 & \sigma_{2} & & 0 & 0 & \cdots\\ \vdots & & \ddots & \vdots & \vdots & \\ 0 & 0 & \cdots & \sigma_{m} & 0 & \cdots\end{array}\right]\left[\begin{array}{ccc} - & v_{1}^{\top} & -\\ - & v_{2}^{\top} & -\\ & \vdots & \\ - & v_{n}^{\top} & -\\ \end{array}\right]$

$A = \left[\begin{array}{cccc} \vert & \vert & & \vert\\ u_{1} & u_{2} & \cdots & u_{m}\\ \vert & \vert & & \vert\end{array}\right] \left[\begin{array}{cccc} \sigma_{1} & 0 & \cdots & 0\\ 0 & \sigma_{2} & & 0\\ \vdots & & \ddots & \vdots \\ 0 & 0 & \cdots & \sigma_{m}\\ 0 & 0 & \cdots & 0\\ \vdots & \vdots & & \vdots\end{array}\right]\left[\begin{array}{ccc} - & v_{1}^{\top} & -\\ - & v_{2}^{\top} & -\\ & \vdots & \\ - & v_{n}^{\top} & -\\ \end{array}\right]$

If $$m<n$$

If $$n<m$$

If $$k\leq p$$ and $A_{k} = \sum_{i=1}^{k}\sigma_{i}u_{i}v_{i}^{\top},$  In matrix form:

$A = \left[\begin{array}{cccc} \vert & \vert & & \vert\\ u_{1} & u_{2} & \cdots & u_{m}\\ \vert & \vert & & \vert\end{array}\right] \left[\begin{array}{cccccc} \sigma_{1} & 0 & \cdots & 0 & 0 & \cdots\\ 0 & \sigma_{2} & & 0 & 0 & \cdots\\ \vdots & & \ddots & \vdots & \vdots & \\ 0 & 0 & \cdots & \sigma_{k} & 0 & \cdots\\ 0 & 0 & \cdots & 0 & 0 & \\ \vdots & \vdots & & \vdots & & \ddots\end{array}\right]\left[\begin{array}{ccc} - & v_{1}^{\top} & -\\ - & v_{2}^{\top} & -\\ & \vdots & \\ - & v_{n}^{\top} & -\\ \end{array}\right]$

Corollary. Given an $$m\times n$$ matrix $$A$$ with singular value decomposition

$A = \sum_{i=1}^{p}\sigma_{i}u_{i}v_{i}^{\top},$ where $$p=\min\{m,n\},$$ fix $$k\leq p$$ and define $A_{k}: = \sum_{i=1}^{k}\sigma_{i}u_{i}v_{i}^{\top}.$

Let $$a_{1},\ldots,a_{n}$$ be the columns of $$A$$, let $$a_{1}^{(k)},\ldots,a_{n}^{(k)}$$ be the columns of $$A_{k}$$. If $$b_{1},\ldots,b_{n}$$ are all vectors in some $$k$$ dimensional subspace of $$\R^{m}$$, then $\sum_{i=1}^{n}\|a_{i} - a_{i}^{(k)}\|^{2}\leq \sum_{i=1}^{n}\|a_{i} - b_{i}\|^{2}$

$A^{\top} = \begin{bmatrix} 1.6356 & 1.2432\\ 3.5324 & 0.9194\\ 5.1171 & 2.1886\\ 1.7070 & -0.2575\\ 2.1051 & 2.0781\\ 3.4407 & 2.3865\\ 2.5796 & 2.2113\\ 2.0085 & 0.3319\\ 3.1858 & 1.6924\\ 3.2266 & 1.9745\\ 2.3962 & 1.7579\\ 5.1645 & 2.7584\\ 2.5640 & 1.3116\\ 3.9581 & 2.4168\\ 1.6782 & 1.7841\\ 1.7983 & 0.5214\\ 0.5665 & 0.5206\\ 3.3292 & 2.1693\\ 1.9416 & 1.1841\\ 1.3375 & 1.4968\\\end{bmatrix}$

$A_{1}^{\top} = \begin{bmatrix} 1.7676 & 1.0128\\ 3.0559 & 1.7509\\ 4.7964 & 2.7482\\ 1.1741 & 0.6727\\ 2.4813 & 1.4217\\ 3.6197 & 2.0740\\ 2.8960 & 1.6593\\ 1.6552 & 0.9484\\ 3.1284 & 1.7925\\ 3.2808 & 1.8798\\ 2.5622 & 1.4681\\ 5.0780 & 2.9095\\ 2.4961 & 1.4301\\ 4.0224 & 2.3047\\ 2.0330 & 1.1649\\ 1.5787 & 0.9046\\ 0.6510 & 0.3730\\ 3.4421 & 1.9722\\ 1.9725 & 1.1301\\ 1.6525 & 0.9468 \end{bmatrix}$

Example.

$$\|A-A_{k}\|_{F} =2.3577...$$

$A^{\top} = \begin{bmatrix} 1.0684 & 2.8744\\ 0.8506 & 3.1363\\ 2.2565 & 1.6282\\ 2.1041 & 1.4684\\ 2.9667 & 0.8386\\ 1.4346 & 3.0533\\ 1.6221 & 2.5084\\ 1.7665 & 1.9735\\ 0.5197 & 3.3750\\ 1.0541 & 2.7291\\ 0.8986 & 3.3379\\ 1.9947 & 2.0902\\ 3.2943 & 1.1735\\ 2.2895 & 1.4871\\ 1.9775 & 1.9606\\ 2.8699 & 1.1791\\ 2.3043 & 1.9548\\ 2.5457 & 1.9132\\ 1.7523 & 1.8961\\ 1.9964 & 2.1832 \end{bmatrix}$

Example.

$A^{\top} = \begin{bmatrix} 1.0684 & 2.8744\\ 0.8506 & 3.1363\\ 2.2565 & 1.6282\\ 2.1041 & 1.4684\\ 2.9667 & 0.8386\\ 1.4346 & 3.0533\\ 1.6221 & 2.5084\\ 1.7665 & 1.9735\\ 0.5197 & 3.3750\\ 1.0541 & 2.7291\\ 0.8986 & 3.3379\\ 1.9947 & 2.0902\\ 3.2943 & 1.1735\\ 2.2895 & 1.4871\\ 1.9775 & 1.9606\\ 2.8699 & 1.1791\\ 2.3043 & 1.9548\\ 2.5457 & 1.9132\\ 1.7523 & 1.8961\\ 1.9964 & 2.1832 \end{bmatrix}$

Example.

How do we do this?

### Principal Component Analysis

Definition. A subset $$X\subset \R^{n}$$ is called an affine subspace if there is a subspace $$V\subset \R^{n}$$ and a vector $$w\in\R^{n}$$ so that

$X = \{w+v: v\in V\}$

Let $$x_{1},\ldots,x_{n}$$ be a collection of vectors in $$\R^{m}.$$ Form the matrix

$A = \begin{bmatrix} | & | & | & & |\\ x_{1} & x_{2} & x_{3} & \cdots & x_{n}\\ | & | & | & & |\end{bmatrix}$

Define the vector

$\overline{x} = \frac{1}{n}\left(x_{1}+x_{2}+\cdots+x_{n}\right)$

Form the matrix

$\tilde{A} = \begin{bmatrix} | & | & | & & |\\ (x_{1}-\overline{x}) & (x_{2}-\overline{x}) & (x_{3}-\overline{x}) & \cdots & (x_{n}-\overline{x})\\ | & | & | & & |\end{bmatrix}$

### Principal Component Analysis

Form the matrix

$\tilde{A}_{k} = \begin{bmatrix} | & | & | & & |\\ y_{1} & y_{2} & y_{3} & \cdots & y_{n}\\ | & | & | & & |\end{bmatrix}$

The vectors $$y_{1},\ldots,y_{n}$$ are in a $$k$$-dimensional subspace $$V$$, and

$\sum_{i=1}^{n}\|(x_{i}-\overline{x})-y_{i}\|^{2}$

is smallest among all collections of vectors $$y_{1},\ldots,y_{n}$$ that live in a $$k$$-dimensional subspace. Note that

$\sum_{i=1}^{n}\|(x_{i}-\overline{x})-y_{i}\|^{2} = \sum_{i=1}^{n}\|x_{i}-(\overline{x}+y_{i})\|^{2}$

and the vectors $$\overline{x}+y_{1},\ldots,\overline{x}+y_{n}$$ all live in the affine subspace

$X = \{\overline{x}+v : v\in V\}.$

Example.

Example.

Example.

Definition. Given an $$m\times n$$ matrix $$A$$ with singular value decomposition $A=U\Sigma V^{\top},$ that is, $$U$$ and $$V$$ are orthogonal matrices, and $$\Sigma$$ is an $$m\times n$$ diagonal matrix with nonnegative entries $$\sigma_{1}\geq \sigma_{2}\geq \cdots\geq \sigma_{p}$$ on the diagonal, we define the pseudoinverse of $$A$$ $A^{+} = V\Sigma^{+}U^{\top}$ where $$\Sigma^{+}$$ is the $$n\times m$$ diagonal matrix with $$i$$th diagonal entry $$1/\sigma_{i}$$ if $$\sigma_{i}\neq 0$$ and $$0$$ if $$\sigma_{i}=0$$.

Example. Consider $A = \begin{bmatrix} 2 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 0\end{bmatrix}, \quad\text{then}\quad A^{+} = \begin{bmatrix} 1/2 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}.$

Example. Consider $A = \begin{bmatrix} 1 & 0 & 0 & 0\\ 0 & 2 & 0 & 0\\ 0 & 0 & 0 & 0\end{bmatrix}.$

Then the singular value decomposition of $$A$$ is

$= \begin{bmatrix} 1 & 0 & 0\\ 0 & 1/2 & 0\\ 0 & 0 & 0\\ 0 & 0 & 0\end{bmatrix}.$

$A= \begin{bmatrix} 0 & 1 & 0\\ 1 & 0 & 0\\ 0 & 0 & 1\end{bmatrix}\begin{bmatrix} 2 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 0\end{bmatrix}\begin{bmatrix} 0 & 1 & 0 & 0\\ 1 & 0 & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1\end{bmatrix}.$

Hence, the pseudo inverse of $$A$$ is

$A^{+}= \begin{bmatrix} 0 & 1 & 0 & 0\\ 1 & 0 & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1\end{bmatrix}\begin{bmatrix} 1/2 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 0\\ 0 & 0 & 0\end{bmatrix}\begin{bmatrix} 0 & 1 & 0\\ 1 & 0 & 0\\ 0 & 0 & 1\end{bmatrix}.$

Example. Consider $A = \begin{bmatrix} 1 & 1\\ 1 & 1\end{bmatrix}.$

Then the singular value decomposition of $$A$$ is

$= \begin{bmatrix}1/4 & 1/4\\ 1/4 & 1/4\end{bmatrix}.$

$A= \left(\frac{1}{\sqrt{2}}\begin{bmatrix} 1 & 1\\ 1 & -1\end{bmatrix}\right)\begin{bmatrix} 2 & 0 \\ 0 & 0\end{bmatrix}\left(\frac{1}{\sqrt{2}}\begin{bmatrix} 1 & 1\\ 1 & -1\end{bmatrix}\right).$

Hence, the pseudo inverse of $$A$$ is

$A^{+}= \left(\frac{1}{\sqrt{2}}\begin{bmatrix} 1 & 1\\ 1 & -1\end{bmatrix}\right)\begin{bmatrix} 1/2 & 0 \\ 0 & 0\end{bmatrix}\left(\frac{1}{\sqrt{2}}\begin{bmatrix} 1 & 1\\ 1 & -1\end{bmatrix}\right).$

Example. Consider $B = \begin{bmatrix} 1 & 0 & 1\\ 0 & 1 & 1\end{bmatrix} = \begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}\\[1ex] \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}}\end{bmatrix}\begin{bmatrix} \sqrt{3} & 0 & 0\\ 0 & 1 & 0\end{bmatrix}\begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{6}} & \frac{2}{\sqrt{6}}\\[1ex] \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} & 0\\[1ex] \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{3}}\end{bmatrix}.$

Hence, the pseudo inverse of $$B$$ is

$B^{+}=\begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}}\\[1ex] \frac{1}{\sqrt{6}} & -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}}\\[1ex] \frac{2}{\sqrt{6}} & 0 & -\frac{1}{\sqrt{3}}\end{bmatrix}\begin{bmatrix} 1/\sqrt{3} & 0\\ 0 & 1\\ 0 & 0\end{bmatrix}\begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}\\[1ex] \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}}\end{bmatrix}.$

$= \frac{1}{3}\begin{bmatrix} 2 & -1\\ -1 & 2\\ 1 & 1\end{bmatrix}$

Note: $BB^{+} = \begin{bmatrix}1 & 0\\ 0 & 1\end{bmatrix} \quad\text{but}\quad B^{+}B = \frac{1}{3}\begin{bmatrix} 2 & -1 & 1\\ -1 & 2 & 1\\ 1 & 1 & 2\end{bmatrix}$

Let $$A$$ be an $$m\times n$$ matrix with singular value decomposition

$A=U\Sigma V^{\top}.$

Hence

$A^{+} = V\Sigma^{+}U^{\top}.$

Let $$\operatorname{rank}(A)=r$$, then

$\Sigma = \begin{bmatrix} D & \mathbf{0}\\ \mathbf{0} & \mathbf{0}\end{bmatrix}\quad\text{and}\quad \Sigma^{+} = \begin{bmatrix} D^{-1} & \mathbf{0}\\ \mathbf{0}\ & \mathbf{0}\end{bmatrix}$

where

$D = \begin{bmatrix}\sigma_{1} & 0 & \cdots & 0\\ 0 & \sigma_{2} & & \vdots\\ \vdots & & \ddots & \vdots\\ 0 &\cdots & \cdots & \sigma_{r} \end{bmatrix}\quad \text{and}\quad D^{-1} = \begin{bmatrix}1/\sigma_{1} & 0 & \cdots & 0\\ 0 & 1/\sigma_{2} & & \vdots\\ \vdots & & \ddots & \vdots\\ 0 &\cdots & \cdots & 1/\sigma_{r} \end{bmatrix}$

and $$\sigma_{1}\geq \sigma_{2}\geq\ldots\geq \sigma_{r}>0$$

Note: These could be empty matrices.

Using this we have

$AA^{+}=U\Sigma V^{\top}V\Sigma^{+}U^{\top} = U\Sigma\Sigma^{+}U^{\top} = U \begin{bmatrix} D & \mathbf{0}\\ \mathbf{0} & \mathbf{0}\end{bmatrix}\begin{bmatrix} D^{-1} & \mathbf{0}\\ \mathbf{0} & \mathbf{0}\end{bmatrix}U^{\top}$

$= U \begin{bmatrix} I & \mathbf{0}\\ \mathbf{0} & \mathbf{0}\end{bmatrix}U^{\top} = U(\,:\,,1\colon\! r)U(\,:\,,1\colon\! r)^{\top}$

Orthogonal projection onto $$C(A)$$.

$A^{+}A=V\Sigma^{+}U^{\top} U\Sigma V^{\top}= V\Sigma^{+}\Sigma V^{\top} = V \begin{bmatrix} D^{-1} & \mathbf{0}\\ \mathbf{0} & \mathbf{0}\end{bmatrix}\begin{bmatrix} D & \mathbf{0}\\ \mathbf{0} & \mathbf{0}\end{bmatrix}V^{\top}$

$= V \begin{bmatrix} I & \mathbf{0}\\ \mathbf{0} & \mathbf{0}\end{bmatrix}V^{\top} = V(\,:\,,1\colon\! r)V(\,:\,,1\colon\! r)^{\top}$

Orthogonal projection onto ???.

If $$A$$ has singular value decomposition

$A=U\Sigma V^{\top},$

then

$A^{\top} = (U\Sigma V^{\top})^{\top} = V\Sigma^{\top} U^{\top}.$

From this we see:

1. The singular values of $$A$$ and $$A^{\top}$$ are the same.
2. The vector $$v$$ is a right singular vector of $$A$$ if and only if $$v$$ is a left singular vector of $$A^{\top}$$.
3. The vector $$u$$ is a left singular vector of $$A$$ if and only if $$u$$ is a right singular vector of $$A^{\top}$$.
4. If $$A$$ is rank $$r$$, then the first $$r$$ columns of $$V$$ are an orthonormal basis for $$C(A^{\top})$$.

$A^{+}A = V(\,:\,,1\colon\! r)V(\,:\,,1\colon\! r)^{\top}$

Orthogonal projection onto $$C(A^{\top})$$.

Example. Consider the equation

$\begin{bmatrix}1 & 1\\ 1 & 2\\ 1 & 3\end{bmatrix}\begin{bmatrix}x_{1}\\ x_{2}\end{bmatrix} = \begin{bmatrix}1\\ 4\\ 9\end{bmatrix}.$

This is an example of an overdetermined system (more equations than unknowns). Note that this equation has no solution since

$\text{rref}\left(\begin{bmatrix}1 & 1 & 1\\ 1 & 2 & 4\\ 1 & 3 & 9\end{bmatrix}\right) = \begin{bmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1\end{bmatrix}$

What if we want to find the vector $$\hat{x} = [x_{1}\ x_{2}]^{\top}$$ that is closest to being a solution, that is, $$\hat{x}$$ is the vector so that $$\|A\hat{x}-b\|$$ is as small as possible. This is called the least squares solution to the above equation.

Now, consider a general matrix equation:

$Ax=b.$

• The closest point in $$C(A)$$ to $$b$$ is $$AA^{+}b$$, since $$AA^{+}$$ is the orthogonal projection onto $$C(A)$$.
• Take $$\hat{x} = A^{+}b$$, then we have $A\hat{x} = AA^{+}b$
• So $$\hat{x} = A^{+}b$$ is a least squares solution to $$Ax=b$$.

Note: If $$Ax=b$$ has a solution, that is, $$b\in C(A)$$, then $$\hat{x}$$ is a solution to $$Ax=b$$, since $$AA^{+}b=b$$.

Example. Consider $A= \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 1 & 1\end{bmatrix} = \begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}}\\[1ex] \frac{1}{\sqrt{6}} & -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}}\\[1ex] \frac{2}{\sqrt{6}} & 0 & -\frac{1}{\sqrt{3}}\end{bmatrix}\begin{bmatrix} \sqrt{3} & 0\\ 0 & 1\\ 0 & 0\end{bmatrix}\begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}\\[1ex] \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}}\end{bmatrix}.$

Then, $A^{+} = \begin{bmatrix} 1 & 0 & 1\\ 0 & 1 & 1\end{bmatrix} = \begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}\\[1ex] \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}}\end{bmatrix}\begin{bmatrix} 1/\sqrt{3} & 0 & 0\\ 0 & 1 & 0\end{bmatrix}\begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{6}} & \frac{2}{\sqrt{6}}\\[1ex] \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} & 0\\[1ex] \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{3}}\end{bmatrix}.$

$= \frac{1}{3}\begin{bmatrix} 2 & -1 & 1\\ -1 & 2 & 1\end{bmatrix}$

If we take $$b=[1\ \ 1\ \ 1]^{\top}$$, then the equation $$Ax=b$$ has no solution. However, the least squares solution is

$\hat{x} = A^{+}\begin{bmatrix}1\\1\\1\end{bmatrix} = \begin{bmatrix}2/3\\ 2/3\end{bmatrix}.$

By John Jasper

• 328