CS6015: Linear Algebra and Random Processes
Lecture 20: Principal Component Analysis (the wishlist)
Learning Objectives
A quick recap of mean, variance and covariance
What is the covariance matrix?
What is the motivation for PCA?
What is the wishlist for representing data using fewer dimensions?
The Eigenstory
real
imaginary
distinct
repeating
A⊤
A−1
AB
A⊤A
(basis)
powers of A
steady state
PCA
optimisation
diagonalisation
A+B
U
R
A2
A+kI
How to compute eigenvalues?
What are the possible values?
What are the eigenvalues of some special matrices ?
What is the relation between the eigenvalues of related matrices?
What do eigen values reveal about a matrix?
What are some applications in which eigenvalues play an important role?
Identity
Projection
Reflection
Markov
Rotation
Singular
Orthogonal
Rank one
Symmetric
Permutation
det(A−λI)=0
det(A - \lambda I) = 0
trace
determinant
invertibility
rank
nullspace
columnspace
(Markov matrices)
(positive definite matrices)
positive pivots
(independent eigenvectors)
(orthogonal eigenvectors)
... ...
(symmetric)
(where are we?)
(characteristic equation)
(desirable)
HW5
distinct values
independent eigenvectors
⟹
Detour: Mean
Salinity
Site 1
Site 2
Site 3
Pressure
Density
Depth
Temp.
n var.
...
a11a21a31⋯⋯am1 a12a22a32⋯⋯am2 a13a23a33⋯⋯am3 a14a24a34⋯⋯am4 a15a25a35⋯⋯am5⋯⋯⋯⋯⋯⋯a1na2na3n⋯⋯amn
\begin{bmatrix}
a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\
a_{21}&~&a_{22}&~&a_{23}&~&a_{24}&~&a_{25}&\cdots&a_{2n}\\
a_{31}&~&a_{32}&~&a_{33}&~&a_{34}&~&a_{35}&\cdots&a_{3n}\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\
\end{bmatrix}
μ1=m1∑i=1mai1
\mu_1 = \frac{1}{m}\sum_{i=1}^m{a_{i1}}
Site m
(mean salinity across all locations)
μ2=m1∑i=1mai2
\mu_2 = \frac{1}{m}\sum_{i=1}^m{a_{i2}}
=X
=X
(mean pressure across all locations)
μj=m1∑i=1maij
\mu_j = \frac{1}{m}\sum_{i=1}^m{a_{ij}}
It is customary/common to subtract the mean from each column and make the data 0-centred
Salinity
Site 1
Site 2
Site 3
Pressure
Density
Depth
Temp.
n var.
...
a11a21a31⋯⋯am1 a12a22a32⋯⋯am2 a13a23a33⋯⋯am3 a14a24a34⋯⋯am4 a15a25a35⋯⋯am5⋯⋯⋯⋯⋯⋯a1na2na3n⋯⋯amn
\begin{bmatrix}
a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\
a_{21}&~&a_{22}&~&a_{23}&~&a_{24}&~&a_{25}&\cdots&a_{2n}\\
a_{31}&~&a_{32}&~&a_{33}&~&a_{34}&~&a_{35}&\cdots&a_{3n}\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\
\end{bmatrix}
Site m
=X
=X
μ^j=m1∑i=1m(aij−μj)
\hat{\mu}_j = \frac{1}{m}\sum_{i=1}^m({a_{ij}} - \mu_j)
New mean
−μ1
-\mu_1
−μ1
-\mu_1
−μ1
-\mu_1
−μ1
-\mu_1
−μ2
-\mu_2
−μ2
-\mu_2
−μ2
-\mu_2
−μ2
-\mu_2
−μ3
-\mu_3
−μ3
-\mu_3
−μ3
-\mu_3
−μ4
-\mu_4
−μ4
-\mu_4
−μ4
-\mu_4
−μ5
-\mu_5
−μ5
-\mu_5
−μ5
-\mu_5
−μ3
-\mu_3
−μ4
-\mu_4
−μ5
-\mu_5
−μn
-\mu_n
−μn
-\mu_n
−μn
-\mu_n
−μn
-\mu_n
=m1∑i=1maij−m1∑i=1mμj
= \frac{1}{m}\sum_{i=1}^m a_{ij} - \frac{1}{m}\sum_{i=1}^m\mu_j
=μj−m1mμj=0
= \mu_j - \frac{1}{m}m \mu_j = 0
The data is now zero-centred (i.e. the mean is 0)
Detour: Mean
For the rest of the discussion we will assume that the data is always zero-centred
If it is not, we can always make it zero-centred by subtracting the mean
Salinity
Site 1
Site 2
Site 3
Pressure
Density
Depth
Temp.
n var.
...
a11a21a31⋯⋯am1 a12a22a32⋯⋯am2 a13a23a33⋯⋯am3 a14a24a34⋯⋯am4 a15a25a35⋯⋯am5⋯⋯⋯⋯⋯⋯a1na2na3n⋯⋯amn
\begin{bmatrix}
a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\
a_{21}&~&a_{22}&~&a_{23}&~&a_{24}&~&a_{25}&\cdots&a_{2n}\\
a_{31}&~&a_{32}&~&a_{33}&~&a_{34}&~&a_{35}&\cdots&a_{3n}\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\
\end{bmatrix}
Site m
=X
=X
Detour: Variance
σ12=m1∑i=1m(ai1−μ1)2
\sigma^2_1 = \frac{1}{m}\sum_{i=1}^m{(a_{i1} - \mu_1)^2}
(variance in salinity across all locations)
∵ the data is zero-centred,
σ12=m1∑i=1mai12
\sigma^2_1 = \frac{1}{m}\sum_{i=1}^m{a_{i1}^2}
σ22=m1∑i=1mai22
\sigma^2_2 = \frac{1}{m}\sum_{i=1}^m{a_{i2}^2}
σj2=m1∑i=1maij2
\sigma^2_j = \frac{1}{m}\sum_{i=1}^m{a_{ij}^2}
(variance in pressure across all locations)
Site 1
Site 2
Site 3
a11a21a31⋯⋯am1 a12a22a32⋯⋯am2 a13a23a33⋯⋯am3 a14a24a34⋯⋯am4 a15a25a35⋯⋯am5⋯⋯⋯⋯⋯⋯a1na2na3n⋯⋯amn
\begin{bmatrix}
a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\
a_{21}&~&a_{22}&~&a_{23}&~&a_{24}&~&a_{25}&\cdots&a_{2n}\\
a_{31}&~&a_{32}&~&a_{33}&~&a_{34}&~&a_{35}&\cdots&a_{3n}\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\
\end{bmatrix}
Site m
=X
=X
Detour: Variance
σj2=m1∑i=1maij2
\sigma^2_j = \frac{1}{m}\sum_{i=1}^m{a_{ij}^2}
x1
\mathbf{x_1}
x2
\mathbf{x_2}
x3
\mathbf{x_3}
x4
\mathbf{x_4}
x5
\mathbf{x_5}
xn
\mathbf{x_n}
=m1xj⊤xj
=\frac{1}{m}\mathbf{x}_j^\top \mathbf{x}_j
Detour: Covariance
∵ the data is zero-centred,
Cov(x1,x2)=m1∑k=1m(ak1−μ1)(ak2−μ2)
Cov(\mathbf{x_1},\mathbf{x_2}) = \frac{1}{m}\sum_{k=1}^m(a_{k1} - \mu_1)(a_{k2} - \mu_2)
Cov(x1,x2)=m1∑k=1mak1ak2
Cov(\mathbf{x_1},\mathbf{x_2}) = \frac{1}{m}\sum_{k=1}^m a_{k1}a_{k2}
Cov(xi,xj)=m1∑k=1makiakj
Cov(\mathbf{x_i},\mathbf{x_j}) = \frac{1}{m}\sum_{k=1}^m a_{ki}a_{kj}
Site 1
Site 2
Site 3
a11a21a31⋯⋯am1 a12a22a32⋯⋯am2 a13a23a33⋯⋯am3 a14a24a34⋯⋯am4 a15a25a35⋯⋯am5⋯⋯⋯⋯⋯⋯a1na2na3n⋯⋯amn
\begin{bmatrix}
a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\
a_{21}&~&a_{22}&~&a_{23}&~&a_{24}&~&a_{25}&\cdots&a_{2n}\\
a_{31}&~&a_{32}&~&a_{33}&~&a_{34}&~&a_{35}&\cdots&a_{3n}\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\
\end{bmatrix}
Site m
=X
=X
x1
\mathbf{x_1}
x2
\mathbf{x_2}
x3
\mathbf{x_3}
x4
\mathbf{x_4}
x5
\mathbf{x_5}
xn
\mathbf{x_n}
Detour: Covariance
Cov(xi,xj)=m1∑k=1makiakj
Cov(\mathbf{x_i},\mathbf{x_j}) = \frac{1}{m}\sum_{k=1}^m a_{ki}a_{kj}
Site 1
Site 2
Site 3
a11a21a31⋯⋯am1 a12a22a32⋯⋯am2 a13a23a33⋯⋯am3 a14a24a34⋯⋯am4 a15a25a35⋯⋯am5⋯⋯⋯⋯⋯⋯a1na2na3n⋯⋯amn
\begin{bmatrix}
a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\
a_{21}&~&a_{22}&~&a_{23}&~&a_{24}&~&a_{25}&\cdots&a_{2n}\\
a_{31}&~&a_{32}&~&a_{33}&~&a_{34}&~&a_{35}&\cdots&a_{3n}\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\
\end{bmatrix}
Site m
=X
=X
x1
\mathbf{x_1}
x2
\mathbf{x_2}
x3
\mathbf{x_3}
x4
\mathbf{x_4}
x5
\mathbf{x_5}
xn
\mathbf{x_n}
=m1xi⊤xj
=\frac{1}{m}\mathbf{x}_i^\top \mathbf{x}_j
Puzzle: What is the matrix m1X⊤X ?
a11a21a31⋯⋯am1a12a22a32⋯⋯am2a13a23a33⋯⋯am3a14a24a34⋯⋯am4⋯⋯⋯⋯⋯⋯a1na2na3n⋯⋯amn
\begin{bmatrix}
a_{11}&a_{12}&a_{13}&a_{14}&\cdots&a_{1n}\\
a_{21}&a_{22}&a_{23}&a_{24}&\cdots&a_{2n}\\
a_{31}&a_{32}&a_{33}&a_{34}&\cdots&a_{3n}\\
\cdots&\cdots&\cdots&\cdots&\cdots&\cdots\\
\cdots&\cdots&\cdots&\cdots&\cdots&\cdots\\
a_{m1}&a_{m2}&a_{m3}&a_{m4}&\cdots&a_{mn}\\
\end{bmatrix}
x1
\mathbf{x_1}
x2
\mathbf{x_2}
x3
\mathbf{x_3}
x4
\mathbf{x_4}
xn
\mathbf{x_n}
a11a12a13a14⋯a1na21a22a23a24⋯a2na31a32a33a34⋯a3na41a42a43a44⋯a4n⋯⋯⋯⋯⋯⋯am1am2am3am4⋯amn
\begin{bmatrix}
a_{11}&a_{21}&a_{31}&a_{41}&\cdots&a_{m1}\\
a_{12}&a_{22}&a_{32}&a_{42}&\cdots&a_{m2}\\
a_{13}&a_{23}&a_{33}&a_{43}&\cdots&a_{m3}\\
a_{14}&a_{24}&a_{34}&a_{44}&\cdots&a_{m4}\\
\cdots&\cdots&\cdots&\cdots&\cdots&\cdots\\
a_{1n}&a_{2n}&a_{3n}&a_{4n}&\cdots&a_{mn}\\
\end{bmatrix}
x1⊤
\mathbf{x_1}^\top
x2⊤
\mathbf{x_2}^\top
x3⊤
\mathbf{x_3}^\top
x4⊤
\mathbf{x_4}^\top
xn⊤
\mathbf{x_n}^\top
X⊤
X^\top
X
X
m1
\frac{1}{m}
= Σij=?
=\begin{bmatrix}
~~~~~~&~~~~~~&~~~~~~&~~~~~~&~~~~~~&~~~~~~\\
~~~~~~&~~~~~~&~~~~~~&~~~~~~&~~~~~~&~~~~~~\\
~~~~~~&~~~~~~&\Sigma_{ij} = ?&~~~~~~&~~~~~~&~~~~~~\\
~~~~~~&~~~~~~&~~~~~~&~~~~~~&~~~~~~&~~~~~~\\
~~~~~~&~~~~~~&~~~~~~&~~~~~~&~~~~~~&~~~~~~\\
~~~~~~&~~~~~~&~~~~~~&~~~~~~&~~~~~~&~~~~~~\\
\end{bmatrix}
Σ
\Sigma
Σij=m1xi⊤xj
\Sigma_{ij} = \frac{1}{m}\mathbf{x_i}^\top\mathbf{x_j}
=Cov(i,j)
=Cov(i,j)
=σi2
=\sigma^2_i
if i=j
if~~i\neq j
if i=j
if~~i= j
Covariance Matrix
(symmetric matrix)
We are now ready to start a discussion on PCA!
The standard basis
Salinity
Site 1
Site 2
Site 3
Pressure
x11x21x31⋯xm1 x12x22x32⋯xm2
\begin{bmatrix}
x_{11}&~&x_{12}\\
x_{21}&~&x_{22}\\
x_{31}&~&x_{32}\\
\cdots&~&\cdots\\
x_{m1}&~&x_{m2}\\
\end{bmatrix}

u1=[10]
\mathbf{u_1} = \begin{bmatrix}1\\0\end{bmatrix}
u2=[01]
\mathbf{u_2} = \begin{bmatrix}0\\1\end{bmatrix}
Site m
[x11x12]
\begin{bmatrix}x_{11}\\x_{12}\end{bmatrix}
[x11x12]
\begin{bmatrix}x_{11}\\x_{12}\end{bmatrix}
=x11[10]+x12[01]
=x_{11}\begin{bmatrix}1\\0\end{bmatrix}+x_{12}\begin{bmatrix}0\\1\end{bmatrix}
x1⊤
\mathbf{x_1}^\top
Note the change in notation on this slide. We are now referring to one row in the data as x
x2⊤
\mathbf{x_2}^\top
x3⊤
\mathbf{x_3}^\top
xm⊤
\mathbf{x_m}^\top
(using the ML notation)
What if we choose a different basis?
[x11x12]
\begin{bmatrix}x_{11}\\x_{12}\end{bmatrix}
=b11v1+b12v2
=b_{11}\mathbf{v_1}+b_{12}\mathbf{v_2}
≈0
\approx 0
∴[x11x12]
\therefore\begin{bmatrix}x_{11}\\x_{12}\end{bmatrix}
≈b11v1
\approx b_{11}\mathbf{v_1}
It seems that the same data which was originally represented using 2 dimensions can now be represented using one dimension by making a smarter choice for the basis!
Salinity
Site 1
Site 2
Site 3
Pressure
x11x21x31⋯xm1 x12x22x32⋯xm2
\begin{bmatrix}
x_{11}&~&x_{12}\\
x_{21}&~&x_{22}\\
x_{31}&~&x_{32}\\
\cdots&~&\cdots\\
x_{m1}&~&x_{m2}\\
\end{bmatrix}
Site m
x1⊤
\mathbf{x_1}^\top
x2⊤
\mathbf{x_2}^\top
x3⊤
\mathbf{x_3}^\top
xm⊤
\mathbf{x_m}^\top

u1=[10]
\mathbf{u_1} = \begin{bmatrix}1\\0\end{bmatrix}
u2=[01]
\mathbf{u_2} = \begin{bmatrix}0\\1\end{bmatrix}
[x11x12]
\begin{bmatrix}x_{11}\\x_{12}\end{bmatrix}
v1
\mathbf{v_1}
v2
\mathbf{v_2}
The bigger question
Salinity
Site 1
Site 2
Site 3
Pressure
Density
Depth
Temp.
n var.
...
x11x21x31⋯⋯xm1 x12x22x32⋯⋯xm2 x13x23x33⋯⋯xm3 x14x24x34⋯⋯xm4 x15x25x35⋯⋯xm5⋯⋯⋯⋯⋯⋯x1nx2nx3n⋯⋯xmn
\begin{bmatrix}
x_{11}&~&x_{12}&~&x_{13}&~&x_{14}&~&x_{15}&\cdots&x_{1n}\\
x_{21}&~&x_{22}&~&x_{23}&~&x_{24}&~&x_{25}&\cdots&x_{2n}\\
x_{31}&~&x_{32}&~&x_{33}&~&x_{34}&~&x_{35}&\cdots&x_{3n}\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
x_{m1}&~&x_{m2}&~&x_{m3}&~&x_{m4}&~&x_{m5}&\cdots&x_{mn}\\
\end{bmatrix}
Site m
=X
=X
Can we represent the data using fewer dimensions by choosing a different basis?
Can we project the data onto a smaller subspace?
OR
Yes, we can!
We will see how!
Let us first dig a bit deeper into our toy example
Why do we not care about v2?
What is being projected, where is it being projected and how is it being projected?
(or why do we think we can represent the data using fewer dimensions)

u1=[10]
\mathbf{u_1} = \begin{bmatrix}1\\0\end{bmatrix}
u2=[01]
\mathbf{u_2} = \begin{bmatrix}0\\1\end{bmatrix}
[x11x12]
\begin{bmatrix}x_{11}\\x_{12}\end{bmatrix}
v1
\mathbf{v_1}
v2
\mathbf{v_2}
Is there something else that we desire?
Why do we not care about v2 ?
[x11x12]
\begin{bmatrix}x_{11}\\x_{12}\end{bmatrix}
=b11v1+b12v2
=b_{11}\mathbf{v_1}+b_{12}\mathbf{v_2}
≈0
\approx 0
∴[x11x12]
\therefore\begin{bmatrix}x_{11}\\x_{12}\end{bmatrix}
≈b11v1
\approx b_{11}\mathbf{v_1}
Because the data has very little variance along this dimension
Salinity
Site 1
Site 2
Site 3
Pressure
x11x21x31⋯xm1 x12x22x32⋯xm2
\begin{bmatrix}
x_{11}&~&x_{12}\\
x_{21}&~&x_{22}\\
x_{31}&~&x_{32}\\
\cdots&~&\cdots\\
x_{m1}&~&x_{m2}\\
\end{bmatrix}
Site m
x1⊤
\mathbf{x_1}^\top
x2⊤
\mathbf{x_2}^\top
x3⊤
\mathbf{x_3}^\top
xm⊤
\mathbf{x_m}^\top

u1=[10]
\mathbf{u_1} = \begin{bmatrix}1\\0\end{bmatrix}
u2=[01]
\mathbf{u_2} = \begin{bmatrix}0\\1\end{bmatrix}
[x11x12]
\begin{bmatrix}x_{11}\\x_{12}\end{bmatrix}
v1
\mathbf{v_1}
v2
\mathbf{v_2}
Wishlist: Represent the data using fewer dimensions such that the data has high variance along these dimensions
Projection: What, where and how?
[x11x12]
\begin{bmatrix}x_{11}\\x_{12}\end{bmatrix}
=b11v1+b12v2
=b_{11}\mathbf{v_1}+b_{12}\mathbf{v_2}
≈0
\approx 0
∴[x11x12]
\therefore\begin{bmatrix}x_{11}\\x_{12}\end{bmatrix}
≈b11v1
\approx b_{11}\mathbf{v_1}
What is being projected?
Salinity
Site 1
Site 2
Site 3
Pressure
x11x21x31⋯xm1 x12x22x32⋯xm2
\begin{bmatrix}
x_{11}&~&x_{12}\\
x_{21}&~&x_{22}\\
x_{31}&~&x_{32}\\
\cdots&~&\cdots\\
x_{m1}&~&x_{m2}\\
\end{bmatrix}
Site m
x1⊤
\mathbf{x_1}^\top
x2⊤
\mathbf{x_2}^\top
x3⊤
\mathbf{x_3}^\top
xm⊤
\mathbf{x_m}^\top

u1=[10]
\mathbf{u_1} = \begin{bmatrix}1\\0\end{bmatrix}
u2=[01]
\mathbf{u_2} = \begin{bmatrix}0\\1\end{bmatrix}
[x11x12]
\begin{bmatrix}x_{11}\\x_{12}\end{bmatrix}
v1
\mathbf{v_1}
v2
\mathbf{v_2}
Wishlist: Represent the data using fewer orthonormal basis vectors
x1
Where is it being projected?
on v1 and v2
How is it being projected?
b11=x1⊤v1
b_{11} = \mathbf{x_1}^\top\mathbf{v_1}
(since v1, v2 are orthonormal)
b21=x1⊤v2
b_{21} = \mathbf{x_1}^\top\mathbf{v_2}
Is there anything else that we desire?

Is z adding any new information beyond what is already contained in y ?
The two columns have a high covariance (when one increases the other also increases)
Wishlist: The covariance between the columns in the new orthonormal basis should be low - ideally 0
(means that the columns should be linearly independent)
Summary of wishlist
Represent the data using fewer dimensions such that
the data has high variance along these dimensions
the covariance between any two dimensions is low
the basis vectors are orthonormal
Learning Objectives
A quick recap of mean, variance and covariance
What is the covariance matrix?
What is the motivation for PCA?
What is the wishlist for representing data using fewer dimensions?
(achieved)
CS6015: Linear Algebra and Random Processes Lecture 20: Principal Component Analysis (the wishlist)
CS6015: Lecture 20
By Mitesh Khapra
CS6015: Lecture 20
Lecture 20: Principal Component Analysis (the wishlist)
- 2,262