CS6910: Fundamentals of Deep Learning
Lecture 6: Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition
Mitesh M. Khapra
Department of Computer Science and Engineering, IIT Madras
Title Text
Subtitle
CS6910: Fundamentals of Deep Learning
Lecture 4: Feedforward Neural Networks, Backpropagation
Mitesh M. Khapra
Department of Computer Science and Engineering, IIT Madras
Module 9.1: Images are extremely high dimen-
sional objects Learning Parameters of Feedforward Neural Networks (Intuition)
Learning Objectives
-
At the end of this lecture, student will be able to identify Eigen values and Eigen vectors from matrices
-
Student will master the fundamentals of Principal Component Analysis and Singular Value Decomposition
References/Acknowledgments
|
---|
Module 6.1: Eigenvalues and Eigenvectors
What happens when a matrix hits a vector?
The vector gets transformed into a new vector (it strays from its path)
The vector may also get scaled (elongated or shortened) in the process.
For a given square matrix A, there exist special vectors which refuse to stray from their path.
These vectors are called eigenvectors.
More formally,
\(Ax = \lambda x\) [direction remains the same]
The vector will only get scaled but will not change its direction.
So what is so special about eigenvectors?
Why are they always in the limelight?
It turns out that several properties of matrices can be analyzed based on their eigenvalues (for example, see spectral graph theory)
We will now see two cases where eigenvalues/vectors will help us in this course
Let us assume that on day \(0\), \(k_1\) students eat Chinese food, and \(k_2\) students eat Mexican food. (Of course, no one eats in the mess!)
On each subsequent day \(i\), a fraction \(p\) of the students who ate Chinese food on day \((i-1)\),
continue to eat Chinese food on day \(i\), and \((1-p)\) shift to Mexican food.
Similarly, a fraction \(q\) of students who ate Mexican food on day \((i-1)\) continue to eat Mexican food on day \(i\), and \((1-q)\) shift to Chinese food.
The number of customers in the two restaurants is thus given by the following series:
In general, \(v_{(n)}=M^nv_{(0)}\)
\(k_1\)
\(k_2\)
Chinese
Mexican
This is a problem for the two restaurant owners.
The number of patrons is changing constantly.
Or is it? Will the system eventually reach a steady state? (i.e. will the number of customers in the two restaurants become constant over time?)
Turns out they will!
Let's see how?
\(k_1\)
\(k_2\)
\(q\)
\(q\)
\(p\)
\(1-p\)
\(1-q\)
Definition
|
---|
Let \(\lambda_1, \lambda_2,...,\lambda_n\) be the eigenvectors of a \(n \times n\) matrix \(A\). \(\lambda_1\) is called the dominant eigen value of \(A\) if
\(|\lambda_1|=|\lambda_i|\), \(i=2,...,n\)
A matrix \(M\) is called a stochastic matrix if all the entries are positive and the sum of the elements in each column is equal to \(1\).
(Note that the matrix in our example is a
stochastic matrix)
If \(A\) is a \(n \times n\) square matrix with a dominant eigenvalue, then the sequence of vectors given by \(Av_{(0)},A^2v_{(0)},...,A^nv_{(0)},...\) approaches a multiple of the dominant eigenvector of A.
(the theorem is slightly misstated here for ease of explanation)
The largest (dominant) eigenvalue of a stochastic matrix is \(1\).
|
---|
Definition
Theorem
|
---|
Theorem
|
---|
Let \(e_d\) be the dominant eigenvector of \(M\) and \(\lambda_d = 1\) the corresponding dominant eigenvalue
\(k_1\)
\(k_2\)
\(q\)
\(q\)
\(p\)
\(1-p\)
\(1-q\)
Given the previous definitions and theorems, what can you say about the sequence \(Mv_{(0)},M^2v_{(0)},M^3v_{(0)},...?\)
There exists an n such that
Now what happens at time step \((n + 1)?\)
\(v_{(n)}=M^nv_{(0)}=ke_d\) (some multiple of \(e_d\))
\(v_{(n+1)}=Mv_{(n)}=M(ke_d)=k(Me_d)=k(\lambda_d e_d)=ke_d\)
The population in the two restaurants becomes constant after time step \(n\).
Now instead of a stochastic matrix let us consider any square matrix \(A\)
Let \(p\) be the time step at which the sequence \(x_0,Ax_0,A^2x_0,...\) approaches a multiple of \(e_d\) (the dominant eigenvector of \(A\))
In general, if \(\lambda_d\) is the dominant eigenvalue of a matrix \(A\), what would happen to the sequence \(x_0,Ax_0,A^2x_0,...\) if
\(|\lambda_d|>1\)
\(|\lambda_d|<1\)
\(|\lambda_d|=1\)
(will explode)
(will vanish)
(will reach a steady state)
(We will use this in the course at some point)
Module 6.2: Linear Algebra - Basic Definitions
We will see some more examples where eigenvectors are important, but before that let's revisit some basic definitions from linear algebra.
Basis
|
---|
A set of vectors \(\in \Reals ^n\) is called a basis, if they are linearly independent and every vector \(\in \Reals ^n\) can be expressed as a linear combination of these vectors.
Linearly independent vectors
|
---|
A set of \(n\) vectors \(v_1, v_2,...,v_n\) is linearly independent if no vector in the set can be expressed as a linear combination of the remaining \(n-1\) vectors.
In other words, the only solution to
\(c_1v_1+c_2v_2+...c_nv_n=0\) is \(c_1=c_2=...=c_n=0\) (\(c_i's\) are scalars)
For example consider the space \(\Reals^2\)
Now consider the vectors
Further, x and y are linearly independent.
(the only solution to \(c_1x + c_2y = 0\) is \(c_1 = c_2 = 0\))
and
Any vector \(\in \Reals^2\), can be expressed as a linear
combination of these two vectors i.e
In fact, turns out that \(x\) and \(y\) are unit vectors in the direction of the co-ordinate axes.
And indeed we are used to representing all vectors in \(\Reals^2\) as a linear combination of these two vectors.
For example, consider the linearly independent vectors, \([2,3]^T\) and \([5,7]^T\). See how any vector \([a, b]^T \in \Reals^2\) can be expressed as a linear combination of these two vectors.
But there is nothing sacrosanct about the particular choice of \(x\) and \(y\).
We could have chosen any 2 linearly independent vectors in \(\Reals^2\) as the basis vectors.
We can find \(x_1\) and \(x_2\) by solving a system of linear equations.
In general, given a set of linearly independent vectors \(u_1, u_2, ...,u_n \in \Reals^n\), we can express any vector \(z \in \Reals^n\) as a linear combination of these vectors.
We can now find the \(\alpha_is\) using Gaussian Elimination (Time Complexity: \(O(n^3)\))
(rewriting in matrix form)
Now let us see if we have orthonormal basis.
\(u_i^T u_j = 0\) \(\forall_i \not = j\) and \(u_i^T u_i = || u_i ||^2 = 1\)
Again we have:
We can directly find each \(i\) using a dot product between \(z\) and \(u_i\) (time complexity \(O(N))\)
The total complexity will be \(O(N^2)\)
Similarly, \(\alpha_2=z^Tu_2\).
When \(u_1\) and \(u_2\) are unit vectors along the co-ordinate axes
CS6910: Lecture 6
By Mitesh Khapra
CS6910: Lecture 6
Sigmoid Neuron to Feedforward Neural Networks
- 638