CS6015: Linear Algebra and Random Processes
Lecture 21: Principal Component Analysis (the math)
Learning Objectives
What is PCA?
What are some applications of PCA?
Recap of Wishlist
Represent the data using fewer dimensions such that
the data has high variance along these dimensions
the covariance between any two dimensions is low
the basis vectors are orthonormal
u1=[10]
\mathbf{u_1} = \begin{bmatrix}1\\0\end{bmatrix}
[x11x12]
\begin{bmatrix}x_{11}\\x_{12}\end{bmatrix}

u2=[01]
\mathbf{u_2} = \begin{bmatrix}0\\1\end{bmatrix}
v2
\mathbf{v_2}
v1
\mathbf{v_1}
We will keep the wishlist aside for now and just build some background first (mostly recap)
Projecting onto one dimension
x11x21x31……xm1x12x22x32……xm2x13x23x33……xm3x14x24x34……xm4………………x1nx2nx3n……xmn
\begin{bmatrix}
x_{11}&x_{12}&x_{13}&x_{14}& \dots &x_{1n} \\
x_{21}&x_{22}&x_{23}&x_{24}& \dots &x_{2n} \\
x_{31}&x_{32}&x_{33}&x_{34}& \dots &x_{3n} \\
\dots &\dots &\dots &\dots &\dots &\dots \\
\dots &\dots &\dots &\dots &\dots &\dots \\
x_{m1}&x_{m2}&x_{m3}&x_{m4}& \dots &x_{mn} \\
\end{bmatrix}
X1⊤
\mathbf{X_1}^\top
X2⊤
\mathbf{X_2}^\top
X3⊤
\mathbf{X_3}^\top
Xm⊤
\mathbf{X_m}^\top
X4⊤
\mathbf{X_4}^\top
X
\mathbf{X}
x1=x11u1+x12u2+⋯+x1nun
\mathbf{x_1} = x_{11}\mathbf{u_1} + x_{12}\mathbf{u_2} + \dots + x_{1n}\mathbf{u_n}
Standard Basis Vectors
(unit norms)
↑v1↓
\begin{bmatrix}
\uparrow\\
\\
v1
\\
\\
\downarrow
\end{bmatrix}
New Basis Vector
(unit norm)
x1=x11^v1
\mathbf{x_1} = \hat{x_{11}}\mathbf{v_1}
x11^=v1⊤v1x1⊤v1=x1⊤v1
\hat{x_{11}} = \frac{\mathbf{x_1}^\top \mathbf{v_1}}{\mathbf{v_1}^\top \mathbf{v_1}} = \mathbf{x_1}^\top \mathbf{v_1}
x2=x21^v1
\mathbf{x_2} = \hat{x_{21}}\mathbf{v_1}
x21^=v1⊤v1x2⊤v1=x2⊤v1
\hat{x_{21}} = \frac{\mathbf{x_2}^\top \mathbf{v_1}}{\mathbf{v_1}^\top \mathbf{v_1}} = \mathbf{x_2}^\top \mathbf{v_1}
u2=[01]
\mathbf{u2} = \begin{bmatrix} 0\\1 \end{bmatrix}
u1=[10]
\mathbf{u1} = \begin{bmatrix} 1\\0 \end{bmatrix}
[x11x12]
\begin {bmatrix}
x_{11}\\
x_{12}
\end {bmatrix}

Projecting onto two dimensions
x11x21x31……xm1x12x22x32……xm2x13x23x33……xm3x14x24x34……xm4………………x1nx2nx3n……xmn
\begin{bmatrix}
x_{11}&x_{12}&x_{13}&x_{14}& \dots &x_{1n} \\
x_{21}&x_{22}&x_{23}&x_{24}& \dots &x_{2n} \\
x_{31}&x_{32}&x_{33}&x_{34}& \dots &x_{3n} \\
\dots &\dots &\dots &\dots &\dots &\dots \\
\dots &\dots &\dots &\dots &\dots &\dots \\
x_{m1}&x_{m2}&x_{m3}&x_{m4}& \dots &x_{mn} \\
\end{bmatrix}
X1⊤
\mathbf{X_1}^\top
X2⊤
\mathbf{X_2}^\top
X3⊤
\mathbf{X_3}^\top
Xm⊤
\mathbf{X_m}^\top
X4⊤
\mathbf{X_4}^\top
X
\mathbf{X}
↑v1↓↑v2↓
\begin{bmatrix}
\uparrow & \uparrow
\\
\\
\mathbf{\scriptsize v_1} & \mathbf{\scriptsize v_2}
\\
\\
\downarrow & \downarrow
\end{bmatrix}
new basis vectors
(unit norm)
x1=x11^v1+x12^v2
\mathbf{x_1} = \hat{x_{11}}\mathbf{v_1}+\hat{x_{12}}\mathbf{v_2}
x11^=x1⊤v1
\hat{x_{11}} = \mathbf{x_1}^\top \mathbf{v_1}
x2=x21^v1+x22^v2
\mathbf{x_2} = \hat{x_{21}}\mathbf{v_1}+\hat{x_{22}}\mathbf{v_2}
x21^=x2⊤v1
\hat{x_{21}} = \mathbf{x_2}^\top \mathbf{v_1}
u2=[01]
\mathbf{u_2} = \begin{bmatrix} 0\\1 \end{bmatrix}
u1=[10]
\mathbf{u_1} = \begin{bmatrix} 1\\0 \end{bmatrix}
[x11x12]
\begin {bmatrix}
x_{11}\\
x_{12}
\end {bmatrix}

X^=
\mathbf{\hat{\textit{X}}} =
x11^x21^x31^……xm1^x12^x22^x32^……xm2^
\begin{bmatrix}
\hat{x_{11}}&\hat{x_{12}}\\
\hat{x_{21}}&\hat{x_{22}}\\
\hat{x_{31}}&\hat{x_{32}}\\
\dots &\dots\\
\dots &\dots\\
\hat{x_{m1}}&\hat{x_{m2}}\\
\end{bmatrix}
=
\mathbf{=}
x1⊤v1x2⊤v1x3⊤v1……xm⊤v1x1⊤v2x2⊤v2x3⊤v2……xm⊤v2
\begin{bmatrix}
\mathbf{x_{1}}^\top \mathbf{v_{1}}&\mathbf{x_{1}}^\top \mathbf{v_{2}} \\
\mathbf{x_{2}}^\top \mathbf{v_{1}}&\mathbf{x_{2}}^\top \mathbf{v_{2}} \\
\mathbf{x_{3}}^\top \mathbf{v_{1}}&\mathbf{x_{3}}^\top \mathbf{v_{2}} \\
\dots &\dots \\
\dots &\dots \\
\mathbf{x_{m}}^\top \mathbf{v_{1}}&\mathbf{x_{m}}^\top \mathbf{v_{2}} \\
\end{bmatrix}
=XV
\mathbf{= \textit{XV}}
x12^=x1⊤v2
\hat{x_{12}} = \mathbf{x_1}^\top \mathbf{v_2}
x21^=x2⊤v2
\hat{x_{21}} = \mathbf{x_2}^\top \mathbf{v_2}
v1
\mathbf{v_1}
V
\mathbf{\textit{V}}
v2
\mathbf{v_2}
Projecting onto k dimension
x11^x21^x31^……xm1^x12^x22^x32^……xm2^………………x1k^x2k^x3k^……xmk^
\begin{bmatrix}
\hat{x_{11}}&\hat{x_{12}}& \dots &\hat{x_{1k}} \\
\hat{x_{21}}&\hat{x_{22}}& \dots &\hat{x_{2k}} \\
\hat{x_{31}}&\hat{x_{32}}& \dots &\hat{x_{3k}} \\
\dots &\dots &\dots &\dots \\
\dots &\dots &\dots &\dots \\
\hat{x_{m1}}&\hat{x_{m2}}& \dots &\hat{x_{mk}} \\
\end{bmatrix}
X1⊤
\mathbf{X_1}^\top
X2⊤
\mathbf{X_2}^\top
X3⊤
\mathbf{X_3}^\top
Xm⊤
\mathbf{X_m}^\top
X4⊤
\mathbf{X_4}^\top
X
\mathbf{\textit{X}}
↑v1↓↑v2↓⋯⋯⋯↑vk↓
\begin{bmatrix}
\uparrow & \uparrow & \cdots & \uparrow
\\
\\
\mathbf{\scriptsize v1} & \mathbf{\scriptsize v2} & \cdots & \mathbf{\scriptsize vk}
\\
\\
\downarrow & \downarrow & \cdots & \downarrow
\end{bmatrix}
New Basis Vectors (unit norm)
x11x21x31……xm1x12x22x32……xm2x13x23x33……xm3x14x24x34……xm4………………x1nx2nx3n……xmn
\begin{bmatrix}
x_{11}&x_{12}&x_{13}&x_{14}& \dots &x_{1n} \\
x_{21}&x_{22}&x_{23}&x_{24}& \dots &x_{2n} \\
x_{31}&x_{32}&x_{33}&x_{34}& \dots &x_{3n} \\
\dots &\dots &\dots &\dots &\dots &\dots \\
\dots &\dots &\dots &\dots &\dots &\dots \\
x_{m1}&x_{m2}&x_{m3}&x_{m4}& \dots &x_{mn} \\
\end{bmatrix}
X^=
\mathbf{\hat{\textit{X}}} =
V
\mathbf{\textit{V}}
x1⊤v1x2⊤v1x3⊤v1……xm⊤v1x1⊤v2x2⊤v2x3⊤v2……xm⊤v2………………x1⊤vkx2⊤vkx3⊤vk……xm⊤vk
\begin{bmatrix}
\mathbf{x_{1}}^\top \mathbf{v_{1}}&\mathbf{x_{1}}^\top \mathbf{v_{2}}&\dots&\mathbf{x_{1}}^\top \mathbf{v_{k}} \\
\mathbf{x_{2}}^\top \mathbf{v_{1}}&\mathbf{x_{2}}^\top \mathbf{v_{2}}&\dots&\mathbf{x_{2}}^\top \mathbf{v_{k}}\\
\mathbf{x_{3}}^\top \mathbf{v_{1}}&\mathbf{x_{3}}^\top \mathbf{v_{2}}&\dots&\mathbf{x_{3}}^\top \mathbf{v_{k}} \\
\dots &\dots &\dots &\dots \\
\dots &\dots &\dots &\dots \\
\mathbf{x_{m}}^\top \mathbf{v_{1}}&\mathbf{x_{m}}^\top \mathbf{v_{2}}&\dots&\mathbf{x_{m}}^\top \mathbf{v_{k}} \\
\end{bmatrix}
=XV
\mathbf{= \textit{XV}}
=
\mathbf{=}
We want to find a V such that
columns of V are orthonormal
columns of X^ have high variance
columns of X^ have low covariance
(ideally 0)
What is the new covariance matrix?
X^=XV
\hat{X} = XV
Σ^=m1X^TX^
\hat{\Sigma} = \frac{1}{m}\hat{X}^T\hat{X}
Σ^=m1(XV)T(XV)
\hat{\Sigma} = \frac{1}{m}(XV)^T(XV)
Σ^=VT(m1XTX)V
\hat{\Sigma} = V^T(\frac{1}{m}X^TX)V
What do we want?
Σ^ij=Cov(i,j)
\hat{\Sigma}_{ij} = Cov(i,j)
if i=j
\text{ if } i \neq j
low covariance
=0
= 0
=σi2
= \sigma^2_i
if i=j
\text{ if } i = j
=0
\neq 0
high variance
We want Σ^ to be diagonal
We are looking for orthogonal vectors which will diagonalise m1XTX :-)
These would be eigenvectors of XTX
(Note that the eigenvectors of cA are the same as the eigenvectors of A)
The eigenbasis of XTX
Σ^=VT(m1XTX)V=D
\hat{\Sigma} = V^T(\frac{1}{m}X^TX)V = D
We have found a V such that
columns of V are orthonormal
eigenvectors of a symmetric matrix
columns of X^ have zero covariance
diagonal
The right basis to use is the eigenbasis of XTX
What about the variance of the columns of X^ ?
?
?
✓
\checkmark
✓
\checkmark
What is the variance of the cols of X^ ?
The i-th column of X^ is
The variance for the i-th column is
The i-th column of X^ is
σi2 = m1X^iTX^i
X
\mathbf{\textit{X}}
↑v1↓↑v2↓⋯⋯⋯↑vk↓
\begin{bmatrix}
\uparrow & \uparrow & \cdots & \uparrow
\\
\\
\mathbf{\scriptsize v1} & \mathbf{\scriptsize v2} & \cdots & \mathbf{\scriptsize vk}
\\
\\
\downarrow & \downarrow & \cdots & \downarrow
\end{bmatrix}
x11x21x31……xm1x12x22x32……xm2x13x23x33……xm3x14x24x34……xm4………………x1nx2nx3n……xmn
\begin{bmatrix}
x_{11}&x_{12}&x_{13}&x_{14}& \dots &x_{1n} \\
x_{21}&x_{22}&x_{23}&x_{24}& \dots &x_{2n} \\
x_{31}&x_{32}&x_{33}&x_{34}& \dots &x_{3n} \\
\dots &\dots &\dots &\dots &\dots &\dots \\
\dots &\dots &\dots &\dots &\dots &\dots \\
x_{m1}&x_{m2}&x_{m3}&x_{m4}& \dots &x_{mn} \\
\end{bmatrix}
V
\mathbf{\textit{V}}
x1⊤v1x2⊤v1x3⊤v1……xm⊤v1x1⊤v2x2⊤v2x3⊤v2……xm⊤v2………………x1⊤vkx2⊤vkx3⊤vk……xm⊤vk
\begin{bmatrix}
\mathbf{x_{1}}^\top \mathbf{v_{1}}&\mathbf{x_{1}}^\top \mathbf{v_{2}}&\dots&\mathbf{x_{1}}^\top \mathbf{v_{k}} \\
\mathbf{x_{2}}^\top \mathbf{v_{1}}&\mathbf{x_{2}}^\top \mathbf{v_{2}}&\dots&\mathbf{x_{2}}^\top \mathbf{v_{k}}\\
\mathbf{x_{3}}^\top \mathbf{v_{1}}&\mathbf{x_{3}}^\top \mathbf{v_{2}}&\dots&\mathbf{x_{3}}^\top \mathbf{v_{k}} \\
\dots &\dots &\dots &\dots \\
\dots &\dots &\dots &\dots \\
\mathbf{x_{m}}^\top \mathbf{v_{1}}&\mathbf{x_{m}}^\top \mathbf{v_{2}}&\dots&\mathbf{x_{m}}^\top \mathbf{v_{k}} \\
\end{bmatrix}
The variance for the i-th column is
X^i = Xvi
X^=
\mathbf{\hat{X} =}
X^1
\mathbf{\hat{X}_{1} }
X^2
\mathbf{\hat{X}_{2} }
X^n
\mathbf{\hat{X}_{n} }
The i-th column of X^ is
The variance for the i-th column is
X^i = Xvi
σi2 = m1X^iTX^i
= m1(Xvi)TXvi
= m1viTXTXvi
= m1viTλivi
= m1(Xvi)TXvi
= m1viTXTXvi
= m1viTλivi
= m1λi
= m1λi
(∵viTvi=1)
Retain only these eigenvectors which have a eigenvalue (high variance)
The full story
(How would you do this in practice?)
Compute the n eigen vectors of X TX
Sort them according to the corresponding eigenvalues
Retain only those eigenvectors corresponding to the top-k eigenvalues
Project the data onto these k eigenvectors
We know that n such vectors will exist since it is a symmetric matrix
These are called the principal components
Heuristics: k=50,100 or choose k such that λk/λmax > t
Reconstruction Error
x=[x11x12]=[3.33]
\mathbf{x} =
\begin{bmatrix}x_{11}\\x_{12}\end{bmatrix} =
\begin{bmatrix}3.3\\3\end{bmatrix}
Suppose
x=3.3u1+3u2
\mathbf{x} = {3.3u_{1} + 3u_{2}}
Let
v1=[11]v2=[−11]
\mathbf{v_{1}} = \begin{bmatrix}1\\1\end{bmatrix}
\mathbf{v_{2}} = \begin{bmatrix}-1\\1\end{bmatrix}
v1=2121v2=−2121
\mathbf{v_{1}} = \begin{bmatrix}\frac{\mathbf{1}}{\sqrt{2}} \\ \\ \frac{\mathbf{1}}{\sqrt{2}}\end{bmatrix}
\mathbf{v_{2}} = \begin{bmatrix}-\frac{\mathbf{1}}{\sqrt{2}} \\ \\ \frac{\mathbf{1}}{\sqrt{2}}\end{bmatrix}
26.3v1+2−0.3v2=[3.33]=x
\frac{6.3}{\sqrt{2}} \mathbf{v_{1}} + \frac{-0.3}{\sqrt{2}}\mathbf{v_{2}} =\begin{bmatrix}3.3\\3\end{bmatrix} =\mathbf{x}

u2=[01]
\mathbf{u_2} = \begin{bmatrix}0\\1\end{bmatrix}
u1=[10]
\mathbf{u_1} = \begin{bmatrix}1\\0\end{bmatrix}
[3.33]
\begin{bmatrix}3.3\\3\end{bmatrix}
if we use all the n eigenvectors
we will get an exact reconstruction of the data
one data point
new basis vectors
unit norm
x=b11v1+b12v2
\mathbf{x} = b_{11}\mathbf{v_{1}} + b_{12}\mathbf{v_{2}} \\
b11=x⊤v1=26.3
b_{11} = \mathbf{x^{\top}v_{1}} = \frac{6.3}{\sqrt{2}} \\
b12=x⊤v2=−20.3
b_{12} = \mathbf{x^{\top}v_{2}} = -\frac{0.3}{\sqrt{2}} \\
Reconstruction Error
x=[x11x12]=[3.33]
\mathbf{x} =
\begin{bmatrix}x_{11}\\x_{12}\end{bmatrix} =
\begin{bmatrix}3.3\\3\end{bmatrix}
Suppose
x=3.3u1+3u2
\mathbf{x} = {3.3u_{1} + 3u_{2}}
Let
v1=[11]v2=[−11]
\mathbf{v_{1}} = \begin{bmatrix}1\\1\end{bmatrix}
\mathbf{v_{2}} = \begin{bmatrix}-1\\1\end{bmatrix}
v1=2121v2=−2121
\mathbf{v_{1}} = \begin{bmatrix}\frac{\mathbf{1}}{\sqrt{2}} \\ \\ \frac{\mathbf{1}}{\sqrt{2}}\end{bmatrix}
\mathbf{v_{2}} = \begin{bmatrix}-\frac{\mathbf{1}}{\sqrt{2}} \\ \\ \frac{\mathbf{1}}{\sqrt{2}}\end{bmatrix}
26.3v1=[3.153.15]=x
\frac{6.3}{\sqrt{2}} \mathbf{v_{1}} =\begin{bmatrix}3.15\\3.15\end{bmatrix} =\mathbf{x}

u2=[01]
\mathbf{u_2} = \begin{bmatrix}0\\1\end{bmatrix}
u1=[10]
\mathbf{u_1} = \begin{bmatrix}1\\0\end{bmatrix}
[3.33]
\begin{bmatrix}3.3\\3\end{bmatrix}
but we are going
to use fewer
eigenvectors
(we will throw away v2)
one data point
new basis vectors
unit norm
b11=x⊤v1=26.3
b_{11} = \mathbf{x^{\top}v_{1}} = \frac{6.3}{\sqrt{2}} \\ \\ \\
\newline
\newline
x=b11v1+b12v2
\mathbf{x} = b_{11}\mathbf{v_{1}} + b_{12}\mathbf{v_{2}} \\
Reconstruction Error
x=[3.33]x^=[3.153.15]
\mathbf{x} =
\begin{bmatrix}3.3\\3\end{bmatrix}
\mathbf{\hat{x}} =
\begin{bmatrix}3.15\\3.15\end{bmatrix}

u2=[01]
\mathbf{u_2} = \begin{bmatrix}0\\1\end{bmatrix}
u1=[10]
\mathbf{u_1} = \begin{bmatrix}1\\0\end{bmatrix}
[3.33]
\begin{bmatrix}3.3\\3\end{bmatrix}
original x
(x−x^)⊤(x−x^)
(\mathbf{x-\hat{x}})^{\top} (\mathbf{x-\hat{x}})
min∑i=im(x−x^)⊤(x−x^)
min\sum_{i=i}^{m} (\mathbf{x-\hat{x}})^{\top} (\mathbf{x-\hat{x}})
xi=∑j=1nbijvj
\mathbf{x_{i}} = \sum_{j=1}^{n} b_{ij}\mathbf{v_{j}} \\
x reconstructed from
fewer eigen vectors
reconstruction error vector
reconstruction error vector
(length of the error)
xi^=∑j=1kbijvj
\mathbf{\hat{x_{i}}} = \sum_{j=1}^{k} b_{ij}\mathbf{v_{j}}
original x - reconstructed from all n eigenvectors
reconstructed only from top K eigenvectors
solving the above optimization problem corresponds to choosing the eigen basis while discarding the eigenvectors corresponding to the smallest eigen values
x−x^
\mathbf{x-\hat{x}} \\
Goal:
V
PCA thus minimizes reconstruction error
Learning Objectives (achieved)
What is PCA?
What are some applications of PCA?
CS6015: Linear Algebra and Random Processes Lecture 21: Principal Component Analysis (the math)
CS6015: Lecture 21
By Mitesh Khapra
CS6015: Lecture 21
Lecture 21: Principal Component Analysis (the math)
- 2,128