CS6910: Fundamentals of Deep Learning

Lecture 6:  Eigen Values, Eigen Vectors, Eigen Value  Decomposition, Principal Component Analysis, Singular Value Decomposition

Mitesh M. Khapra

Department of Computer Science and Engineering, IIT Madras

Title Text

Subtitle

CS6910: Fundamentals of Deep Learning

Lecture 4: Feedforward Neural Networks, Backpropagation 

Mitesh M. Khapra

Department of Computer Science and Engineering, IIT Madras

Module 9.1: Images are extremely high dimen-
sional objects Learning Parameters of Feedforward Neural Networks (Intuition)

Learning Objectives

  • At the end of this lecture, student will be able to identify Eigen values and Eigen vectors from matrices

  • Student will master the fundamentals of Principal Component Analysis and Singular Value Decomposition

   References/Acknowledgments






 

Module 6.1: Eigenvalues and Eigenvectors

What happens when a matrix hits a vector?

The vector gets transformed into a new vector (it strays from its path)

The vector may also get scaled (elongated or shortened) in the process.

For a given square matrix A, there exist special vectors which refuse to stray from their path.

These vectors are called eigenvectors.

More formally,

\(Ax = \lambda x\) [direction remains the same]

The vector will only get scaled but will not change its direction.

So what is so special about eigenvectors?

Why are they always in the limelight?

It turns out that several properties of matrices can be analyzed based on their eigenvalues (for example, see spectral graph theory)

We will now see two cases where eigenvalues/vectors will help us in this course

Let us assume that on day \(0\), \(k_1\) students eat Chinese food, and \(k_2\) students eat Mexican food. (Of course, no one eats in the mess!)

On each subsequent day \(i\), a fraction \(p\) of the students who ate Chinese food on day \((i-1)\),
continue to eat Chinese food on day \(i\), and \((1-p)\) shift to Mexican food.

Similarly, a fraction \(q\) of students who ate Mexican food on day \((i-1)\) continue to eat Mexican food on day \(i\), and \((1-q)\) shift to Chinese food.

The number of customers in the two restaurants is thus given by the following series:

v_{(0)}, Mv_{(0)},M^2v_{(0)},M^3v_{(0)},...
v_{(1)}=\begin{bmatrix} pk_1 + (1-q)k_2\\ (1-p)k_1+qk_2 \end{bmatrix}

In general, \(v_{(n)}=M^nv_{(0)}\)

\(k_1\)

\(k_2\)

Chinese

Mexican

v_{(0)}=\begin{bmatrix} k_1 \\ k_2 \end{bmatrix}
\begin{bmatrix} k_1 \\ k_2 \end{bmatrix}
=\begin{bmatrix} p & 1-q\\ 1-p & q \end{bmatrix}
v_{(1)}=Mv_{(0)}
v_{(2)}=Mv_{(1)}
=M^2v_{(0)}

This is a problem for the two restaurant owners.

The number of patrons is changing constantly.

Or is it? Will the system eventually reach a steady state? (i.e. will the number of customers in the two restaurants become constant over time?)

Turns out they will!

Let's see how?

\(k_1\)

\(k_2\)

\(q\)

\(q\)

\(p\)

\(1-p\)

\(1-q\)

 Definition





 

Let \(\lambda_1, \lambda_2,...,\lambda_n\) be the eigenvectors of a \(n \times n\) matrix \(A\). \(\lambda_1\) is called the dominant eigen value of \(A\) if

      \(|\lambda_1|=|\lambda_i|\), \(i=2,...,n\)

 

A matrix \(M\) is called a stochastic matrix if all the entries are positive and the sum of the elements in each column is equal to \(1\).
(Note that the matrix in our example is a
stochastic matrix)

If \(A\) is a \(n \times n\) square matrix with a dominant eigenvalue, then the sequence of vectors given by \(Av_{(0)},A^2v_{(0)},...,A^nv_{(0)},...\) approaches a multiple of the dominant eigenvector of A.

(the theorem is slightly misstated here for ease of explanation)

The largest (dominant) eigenvalue of a stochastic matrix is \(1\).





 

 Definition

 Theorem





 

 Theorem







 

Let \(e_d\) be the dominant eigenvector of \(M\) and \(\lambda_d = 1\) the corresponding dominant eigenvalue

\(k_1\)

\(k_2\)

\(q\)

\(q\)

\(p\)

\(1-p\)

\(1-q\)

Given the previous definitions and theorems, what can you say about the sequence \(Mv_{(0)},M^2v_{(0)},M^3v_{(0)},...?\)

There exists an n such that

Now what happens at time step \((n + 1)?\)

\(v_{(n)}=M^nv_{(0)}=ke_d\) (some multiple of \(e_d\))

\(v_{(n+1)}=Mv_{(n)}=M(ke_d)=k(Me_d)=k(\lambda_d e_d)=ke_d\)

The population in the two restaurants becomes constant after time step \(n\).

Now instead of a stochastic matrix let us consider any square matrix \(A\)

Let \(p\) be the time step at which the sequence \(x_0,Ax_0,A^2x_0,...\) approaches a multiple of \(e_d\) (the dominant eigenvector of \(A\))

A^px_0=ke_d
A^{p+1}x_0=A(A^px_0)=kAe_d=k\lambda_de_d
A^{p+2}x_0=A(A^{p+1}x_0)=k\lambda_dAe_d=k\lambda_de_d
A^{p+n}x_0=
k(\lambda_d)^ne_d

In general, if \(\lambda_d\) is the dominant eigenvalue of a matrix \(A\), what would  happen to the sequence \(x_0,Ax_0,A^2x_0,...\) if

\(|\lambda_d|>1\)

\(|\lambda_d|<1\)

\(|\lambda_d|=1\)

(will explode)

(will vanish)

(will reach a steady state)

(We will use this in the course at some point)

Module 6.2: Linear Algebra - Basic Definitions

We will see some more examples where eigenvectors are important, but before that let's revisit some basic definitions from linear algebra.

 Basis




 

A set of vectors \(\in \Reals ^n\) is called a basis, if they are linearly independent and every vector \(\in \Reals ^n\) can be expressed as a linear combination of these vectors.

 Linearly independent vectors






 

A set of \(n\) vectors \(v_1, v_2,...,v_n\) is linearly independent if no vector in the set can be expressed as a linear combination of the remaining \(n-1\) vectors.

In other words, the only solution to

             \(c_1v_1+c_2v_2+...c_nv_n=0\)  is  \(c_1=c_2=...=c_n=0\) (\(c_i's\) are scalars)

For example consider the space \(\Reals^2\)

Now consider the vectors

\begin{bmatrix} a \\ b \\ \end{bmatrix} = a \begin{bmatrix} 1 \\ 0 \\ \end{bmatrix} + b \begin{bmatrix} 0 \\ 1 \\ \end{bmatrix}

Further, x and y are linearly independent.

(the only solution to \(c_1x + c_2y = 0\) is \(c_1 = c_2 = 0\))

and

x=\begin{bmatrix} 1 \\ 0 \\ \end{bmatrix}
y=\begin{bmatrix} 0 \\ 1 \\ \end{bmatrix}

Any vector        \(\in \Reals^2\), can be expressed as a linear 

\begin{bmatrix} a \\ b \\ \end{bmatrix}

combination of these two vectors i.e

In fact, turns out that \(x\) and \(y\) are unit vectors in the direction of the co-ordinate axes.

And indeed we are used to representing all vectors in \(\Reals^2\) as a linear combination of these two vectors.

\begin{bmatrix} a \\ b \\ \end{bmatrix} = x_1 \begin{bmatrix} 2 \\ 3 \\ \end{bmatrix} + x_2 \begin{bmatrix} 5 \\ 7 \\ \end{bmatrix}

For example, consider the linearly independent vectors, \([2,3]^T\) and \([5,7]^T\). See how any vector \([a, b]^T \in \Reals^2\) can be expressed as a linear combination of these two vectors.

But there is nothing sacrosanct about the particular choice of \(x\) and \(y\).

We could have chosen any 2 linearly independent vectors in \(\Reals^2\) as the basis vectors.

We can find \(x_1\) and \(x_2\) by solving a system of linear equations.

a=2x_1+5x_2
b=3x_1+7x_2

In general, given a set of linearly independent vectors \(u_1, u_2, ...,u_n \in \Reals^n\), we can express any vector \(z \in \Reals^n\) as a linear combination of these vectors.

z=\alpha_1u_1+\alpha_2u_2+...+\alpha_nu_n
\begin{bmatrix} z_1 \\ z_2 \\ \vdots \\ z_n \end{bmatrix} =\alpha_1 \begin{bmatrix} u_{11} \\ u_{12} \\ \vdots \\ u_{1n} \end{bmatrix} + \alpha_2 \begin{bmatrix} u_{21} \\ u_{22} \\ \vdots \\ u_{2n} \end{bmatrix} + \dots + \alpha_n \begin{bmatrix} u_{n1} \\ u_{n2} \\ \vdots \\ u_{nn} \end{bmatrix}
\begin{bmatrix} z_1 \\ z_2 \\ \vdots \\ z_n \end{bmatrix} =\begin{bmatrix} u_{11} & u_{21} & \dots u_{n1}\\ u_{12} & u_{22} & \dots u_{n2}\\ \vdots & \vdots & \vdots & \vdots \\ u_{1n} & u_{2n} & \dots u_{nn}\\ \end{bmatrix} \begin{bmatrix} \alpha_1 \\ \alpha_2 \\ \vdots \\ \alpha_n \end{bmatrix}

We can now find the \(\alpha_is\) using Gaussian Elimination (Time Complexity: \(O(n^3)\))

(rewriting in matrix form)

Now let us see if we have orthonormal basis.

\(u_i^T u_j = 0\) \(\forall_i \not = j\) and \(u_i^T u_i = || u_i ||^2 = 1\)

Again we have:

z=\alpha_1u_1+\alpha_2u_2+...+\alpha_nu_n
u_1^Tz=\alpha_1u_1^Tu_1+...+\alpha_nu_1^Tu_n
=\alpha_1

We can directly find each \(i\) using a dot product between \(z\) and \(u_i\) (time complexity \(O(N))\)

The total complexity will be \(O(N^2)\)

z = \begin{bmatrix} a \\ b \\ \end{bmatrix}
\alpha_1=|\vec z| cos \theta = |\vec z| \frac {z^Tu_1}{|\vec z||u_1}=z^Tu_1

Similarly, \(\alpha_2=z^Tu_2\).

When \(u_1\) and \(u_2\) are unit vectors along the co-ordinate axes

z=\begin{bmatrix} a \\ b \\ \end{bmatrix} = a \begin{bmatrix} 1 \\ 0 \\ \end{bmatrix} + b \begin{bmatrix} 0 \\ 1 \\ \end{bmatrix}

CS6910: Lecture 6

By Mitesh Khapra

CS6910: Lecture 6

Sigmoid Neuron to Feedforward Neural Networks

  • 525