CS6015: Linear Algebra and Random Processes

Lecture 20: Principal Component Analysis (the wishlist)

Learning Objectives

A quick recap of mean, variance and covariance

What is the covariance matrix?

What is the motivation for PCA?

What is the wishlist for representing data using fewer dimensions?

The Eigenstory

real

imaginary

distinct

repeating

\(A^\top\)

\(A^{-1}\)

\(AB\)

\(A^\top A\)

(basis)

powers of A

steady state

PCA

optimisation

diagonalisation

\(A+B\)

\(U\)

\(R\)

\(A^2\)

\(A + kI\)

How to compute eigenvalues?

What are the possible values?

What are the eigenvalues of some special matrices ?

What is the relation between the eigenvalues of related matrices?

What do eigen values reveal about a matrix?

What are some applications in which eigenvalues play an important role?

Identity

Projection

Reflection

Markov

Rotation

Singular

Orthogonal

Rank one

Symmetric

Permutation

det(A - \lambda I) = 0

trace

determinant

invertibility

rank

nullspace

columnspace

(Markov matrices)

(positive definite matrices)

positive pivots

(independent eigenvectors)

(orthogonal eigenvectors)

... ...

(symmetric)

(where are we?)

(characteristic equation)

(desirable)

HW5

distinct values

independent eigenvectors

\(\implies\)

Detour: Mean

Salinity

Site 1

Site 2

Site 3

Pressure

Density

Depth

Temp.

n var.

...

\begin{bmatrix} a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\ a_{21}&~&a_{22}&~&a_{23}&~&a_{24}&~&a_{25}&\cdots&a_{2n}\\ a_{31}&~&a_{32}&~&a_{33}&~&a_{34}&~&a_{35}&\cdots&a_{3n}\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\ \end{bmatrix}

\mu_1 = \frac{1}{m}\sum_{i=1}^m{a_{i1}}

Site m

(mean salinity across all locations)

\mu_2 = \frac{1}{m}\sum_{i=1}^m{a_{i2}}

(mean pressure across all locations)

\mu_j = \frac{1}{m}\sum_{i=1}^m{a_{ij}}

It is customary/common to subtract the mean from each column and make the data 0-centred

Salinity

Site 1

Site 2

Site 3

Pressure

Density

Depth

Temp.

n var.

...

Site m

\hat{\mu}_j = \frac{1}{m}\sum_{i=1}^m({a_{ij}} - \mu_j)

New mean

-\mu_1

-\mu_2

-\mu_3

-\mu_4

-\mu_5

-\mu_3

-\mu_4

-\mu_5

-\mu_n

= \frac{1}{m}\sum_{i=1}^m a_{ij} - \frac{1}{m}\sum_{i=1}^m\mu_j

= \mu_j - \frac{1}{m}m \mu_j = 0

The data is now zero-centred (i.e. the mean is 0)

Detour: Mean

For the rest of the discussion we will assume that the data is always zero-centred

If it is not, we can always make it zero-centred by subtracting the mean

Salinity

Site 1

Site 2

Site 3

Pressure

Density

Depth

Temp.

n var.

...

Site m

Detour: Variance

\sigma^2_1 = \frac{1}{m}\sum_{i=1}^m{(a_{i1} - \mu_1)^2}

(variance in salinity across all locations)

\(\because\) the data is zero-centred,

\sigma^2_1 = \frac{1}{m}\sum_{i=1}^m{a_{i1}^2}

\sigma^2_2 = \frac{1}{m}\sum_{i=1}^m{a_{i2}^2}

\sigma^2_j = \frac{1}{m}\sum_{i=1}^m{a_{ij}^2}

(variance in pressure across all locations)

Site 1

Site 2

Site 3

Site m

Detour: Variance

\sigma^2_j = \frac{1}{m}\sum_{i=1}^m{a_{ij}^2}

\mathbf{x_1}

\mathbf{x_2}

\mathbf{x_3}

\mathbf{x_4}

\mathbf{x_5}

\mathbf{x_n}

=\frac{1}{m}\mathbf{x}_j^\top \mathbf{x}_j

Detour: Covariance

\(\because\) the data is zero-centred,

Cov(\mathbf{x_1},\mathbf{x_2}) = \frac{1}{m}\sum_{k=1}^m(a_{k1} - \mu_1)(a_{k2} - \mu_2)

Cov(\mathbf{x_1},\mathbf{x_2}) = \frac{1}{m}\sum_{k=1}^m a_{k1}a_{k2}

Cov(\mathbf{x_i},\mathbf{x_j}) = \frac{1}{m}\sum_{k=1}^m a_{ki}a_{kj}

Site 1

Site 2

Site 3

Site m

\mathbf{x_1}

\mathbf{x_2}

\mathbf{x_3}

\mathbf{x_4}

\mathbf{x_5}

\mathbf{x_n}

Detour: Covariance

Cov(\mathbf{x_i},\mathbf{x_j}) = \frac{1}{m}\sum_{k=1}^m a_{ki}a_{kj}

Site 1

Site 2

Site 3

Site m

\mathbf{x_1}

\mathbf{x_2}

\mathbf{x_3}

\mathbf{x_4}

\mathbf{x_5}

\mathbf{x_n}

=\frac{1}{m}\mathbf{x}_i^\top \mathbf{x}_j

Puzzle: What is the matrix \(\frac{1}{m}X^\top X\) ?

\begin{bmatrix} a_{11}&a_{12}&a_{13}&a_{14}&\cdots&a_{1n}\\ a_{21}&a_{22}&a_{23}&a_{24}&\cdots&a_{2n}\\ a_{31}&a_{32}&a_{33}&a_{34}&\cdots&a_{3n}\\ \cdots&\cdots&\cdots&\cdots&\cdots&\cdots\\ \cdots&\cdots&\cdots&\cdots&\cdots&\cdots\\ a_{m1}&a_{m2}&a_{m3}&a_{m4}&\cdots&a_{mn}\\ \end{bmatrix}

\mathbf{x_1}

\mathbf{x_2}

\mathbf{x_3}

\mathbf{x_4}

\mathbf{x_n}

\begin{bmatrix} a_{11}&a_{21}&a_{31}&a_{41}&\cdots&a_{m1}\\ a_{12}&a_{22}&a_{32}&a_{42}&\cdots&a_{m2}\\ a_{13}&a_{23}&a_{33}&a_{43}&\cdots&a_{m3}\\ a_{14}&a_{24}&a_{34}&a_{44}&\cdots&a_{m4}\\ \cdots&\cdots&\cdots&\cdots&\cdots&\cdots\\ a_{1n}&a_{2n}&a_{3n}&a_{4n}&\cdots&a_{mn}\\ \end{bmatrix}

\mathbf{x_1}^\top

\mathbf{x_2}^\top

\mathbf{x_3}^\top

\mathbf{x_4}^\top

\mathbf{x_n}^\top

X^\top

\frac{1}{m}

=\begin{bmatrix} ~~~~~~&~~~~~~&~~~~~~&~~~~~~&~~~~~~&~~~~~~\\ ~~~~~~&~~~~~~&~~~~~~&~~~~~~&~~~~~~&~~~~~~\\ ~~~~~~&~~~~~~&\Sigma_{ij} = ?&~~~~~~&~~~~~~&~~~~~~\\ ~~~~~~&~~~~~~&~~~~~~&~~~~~~&~~~~~~&~~~~~~\\ ~~~~~~&~~~~~~&~~~~~~&~~~~~~&~~~~~~&~~~~~~\\ ~~~~~~&~~~~~~&~~~~~~&~~~~~~&~~~~~~&~~~~~~\\ \end{bmatrix}

\Sigma

\Sigma_{ij} = \frac{1}{m}\mathbf{x_i}^\top\mathbf{x_j}

=Cov(i,j)

=\sigma^2_i

if~~i\neq j

if~~i= j

Covariance Matrix

(symmetric matrix)

We are now ready to start a discussion on PCA!

The standard basis

Salinity

Site 1

Site 2

Site 3

Pressure

\begin{bmatrix} x_{11}&~&x_{12}\\ x_{21}&~&x_{22}\\ x_{31}&~&x_{32}\\ \cdots&~&\cdots\\ x_{m1}&~&x_{m2}\\ \end{bmatrix}

\mathbf{u_1} = \begin{bmatrix}1\\0\end{bmatrix}

\mathbf{u_2} = \begin{bmatrix}0\\1\end{bmatrix}

Site m

\begin{bmatrix}x_{11}\\x_{12}\end{bmatrix}

=x_{11}\begin{bmatrix}1\\0\end{bmatrix}+x_{12}\begin{bmatrix}0\\1\end{bmatrix}

\mathbf{x_1}^\top

Note the change in notation on this slide. We are now referring to one row in the data as x

\mathbf{x_2}^\top

\mathbf{x_3}^\top

\mathbf{x_m}^\top

(using the ML notation)

What if we choose a different basis?

\begin{bmatrix}x_{11}\\x_{12}\end{bmatrix}

=b_{11}\mathbf{v_1}+b_{12}\mathbf{v_2}

\approx 0

\therefore\begin{bmatrix}x_{11}\\x_{12}\end{bmatrix}

\approx b_{11}\mathbf{v_1}

It seems that the same data which was originally represented using 2 dimensions can now be represented using one dimension by making a smarter choice for the basis!

Salinity

Site 1

Site 2

Site 3

Pressure

\begin{bmatrix} x_{11}&~&x_{12}\\ x_{21}&~&x_{22}\\ x_{31}&~&x_{32}\\ \cdots&~&\cdots\\ x_{m1}&~&x_{m2}\\ \end{bmatrix}

Site m

\mathbf{x_1}^\top

\mathbf{x_2}^\top

\mathbf{x_3}^\top

\mathbf{x_m}^\top

\mathbf{u_1} = \begin{bmatrix}1\\0\end{bmatrix}

\mathbf{u_2} = \begin{bmatrix}0\\1\end{bmatrix}

\begin{bmatrix}x_{11}\\x_{12}\end{bmatrix}

\mathbf{v_1}

\mathbf{v_2}

The bigger question

Salinity

Site 1

Site 2

Site 3

Pressure

Density

Depth

Temp.

n var.

...

\begin{bmatrix} x_{11}&~&x_{12}&~&x_{13}&~&x_{14}&~&x_{15}&\cdots&x_{1n}\\ x_{21}&~&x_{22}&~&x_{23}&~&x_{24}&~&x_{25}&\cdots&x_{2n}\\ x_{31}&~&x_{32}&~&x_{33}&~&x_{34}&~&x_{35}&\cdots&x_{3n}\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ x_{m1}&~&x_{m2}&~&x_{m3}&~&x_{m4}&~&x_{m5}&\cdots&x_{mn}\\ \end{bmatrix}

Site m

Can we represent the data using fewer dimensions by choosing a different basis?

Can we project the data onto a smaller subspace?

OR

Yes, we can!

We will see how!

Let us first dig a bit deeper into our toy example

Why do we not care about \(\mathbf{v_2}\)?

What is being projected, where is it being projected and how is it being projected?

(or why do we think we can represent the data using fewer dimensions)

\mathbf{u_1} = \begin{bmatrix}1\\0\end{bmatrix}

\mathbf{u_2} = \begin{bmatrix}0\\1\end{bmatrix}

\begin{bmatrix}x_{11}\\x_{12}\end{bmatrix}

\mathbf{v_1}

\mathbf{v_2}

Is there something else that we desire?

Why do we not care about \(\mathbf{v_2}\) ?

\begin{bmatrix}x_{11}\\x_{12}\end{bmatrix}

=b_{11}\mathbf{v_1}+b_{12}\mathbf{v_2}

\approx 0

\therefore\begin{bmatrix}x_{11}\\x_{12}\end{bmatrix}

\approx b_{11}\mathbf{v_1}

Because the data has very little variance along this dimension

Salinity

Site 1

Site 2

Site 3

Pressure

\begin{bmatrix} x_{11}&~&x_{12}\\ x_{21}&~&x_{22}\\ x_{31}&~&x_{32}\\ \cdots&~&\cdots\\ x_{m1}&~&x_{m2}\\ \end{bmatrix}

Site m

\mathbf{x_1}^\top

\mathbf{x_2}^\top

\mathbf{x_3}^\top

\mathbf{x_m}^\top

\mathbf{u_1} = \begin{bmatrix}1\\0\end{bmatrix}

\mathbf{u_2} = \begin{bmatrix}0\\1\end{bmatrix}

\begin{bmatrix}x_{11}\\x_{12}\end{bmatrix}

\mathbf{v_1}

\mathbf{v_2}

Wishlist: Represent the data using fewer dimensions such that the data has high variance along these dimensions

Projection: What, where and how?

\begin{bmatrix}x_{11}\\x_{12}\end{bmatrix}

=b_{11}\mathbf{v_1}+b_{12}\mathbf{v_2}

\approx 0

\therefore\begin{bmatrix}x_{11}\\x_{12}\end{bmatrix}

\approx b_{11}\mathbf{v_1}

What is being projected?

Salinity

Site 1

Site 2

Site 3

Pressure

\begin{bmatrix} x_{11}&~&x_{12}\\ x_{21}&~&x_{22}\\ x_{31}&~&x_{32}\\ \cdots&~&\cdots\\ x_{m1}&~&x_{m2}\\ \end{bmatrix}

Site m

\mathbf{x_1}^\top

\mathbf{x_2}^\top

\mathbf{x_3}^\top

\mathbf{x_m}^\top

\mathbf{u_1} = \begin{bmatrix}1\\0\end{bmatrix}

\mathbf{u_2} = \begin{bmatrix}0\\1\end{bmatrix}

\begin{bmatrix}x_{11}\\x_{12}\end{bmatrix}

\mathbf{v_1}

\mathbf{v_2}

Wishlist: Represent the data using fewer orthonormal basis vectors

\(\mathbf{x_1}\)

Where is it being projected?

on \(\mathbf{v_1}\) and \(\mathbf{v_2}\)

How is it being projected?

b_{11} = \mathbf{x_1}^\top\mathbf{v_1}

(since v1, v2 are orthonormal)

b_{21} = \mathbf{x_1}^\top\mathbf{v_2}

Is there anything else that we desire?