CS6015: Linear Algebra and Random Processes
Lecture 11: A tiny bit of ML, vector norms, orthogonal vectors, orthogonal subspaces
Please join on https://iitmadras.webex.com/join/miteshk
Learning Objectives
How do vectors and matrices show up in Machine Learning?
What are orthogonal vectors?
How do you compute the norm of a vector?
(for today's lecture)
What do you do when Ax=b does not have a solution (intuition)?
What are orthogonal subspaces?
An example from Machine Learning
Salinity
How much oil can be recovered from a drilling site?
Site 1
Site 2
Site 3
Site m
Pressure
Density
Depth
Temp.
n var.
...
(Rely on past data)
Quantity
\begin{bmatrix}
a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\
a_{21}&~&a_{22}&~&a_{23}&~&a_{24}&~&a_{25}&\cdots&a_{2n}\\
a_{31}&~&a_{32}&~&a_{33}&~&a_{34}&~&a_{35}&\cdots&a_{3n}\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\
\end{bmatrix}
\begin{bmatrix}
b_{1}\\
b_{2}\\
b_{3}\\
\cdots\\
\cdots\\
b_{m}\\
\end{bmatrix}
You know that some relation exists between 'Quantity' and the n variables
But you don't know what \( f \) is!
b_{1} = f(a_{11}, a_{12}, a_{13}, \cdots, a_{1n})
So what do you do?
An example from Machine Learning
# of Degrees
How much salary should be offered to a candidate?
Emp. 1
Emp. 2
Emp. 3
Emp. m
# Exp.
# Projects
Univ. Rank
# Prog. Lang. known
n var.
...
(Rely on past employee data)
Salary
\begin{bmatrix}
a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\
a_{21}&~&a_{22}&~&a_{23}&~&a_{24}&~&a_{25}&\cdots&a_{2n}\\
a_{31}&~&a_{32}&~&a_{33}&~&a_{34}&~&a_{35}&\cdots&a_{3n}\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\
\end{bmatrix}
\begin{bmatrix}
b_{1}\\
b_{2}\\
b_{3}\\
\cdots\\
\cdots\\
b_{m}\\
\end{bmatrix}
You know that some relation exists between 'Salary' and the n variables
But you don't know what \( f \) is!
b_{1} = f(a_{11}, a_{12}, a_{13}, \cdots, a_{1n})
So what do you do?
An example from Machine Learning
Total Pop.
How many COVID19 cases in the next week in a locality?
Loc. 1
Loc. 2
Loc. 3
Loc. m
Pop. Density
Avg. Income
Last week's count
# masks sold
n var.
...
(Rely on past data)
Cases
\begin{bmatrix}
a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\
a_{21}&~&a_{22}&~&a_{23}&~&a_{24}&~&a_{25}&\cdots&a_{2n}\\
a_{31}&~&a_{32}&~&a_{33}&~&a_{34}&~&a_{35}&\cdots&a_{3n}\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\
\end{bmatrix}
\begin{bmatrix}
b_{1}\\
b_{2}\\
b_{3}\\
\cdots\\
\cdots\\
b_{m}\\
\end{bmatrix}
You know that some relation exists between 'Cases' and the n variables
But you don't know what \( f \) is!
b_{1} = f(a_{11}, a_{12}, a_{13}, \cdots, a_{1n})
So what do you do?
A typical Machine Learning setup
You know that some relation exists between the 'output' and the 'input' variables
But you don't know what \( f \) is!
b_{1} = f(a_{11}, a_{12}, a_{13}, \cdots, a_{1n})
So what do you do?
You make some assumption about \( f \)
What is the simplest assumption you can make?
simple, interpretable, less predictive (less accurate)
Can you make more complex assumptions?
Yes,
b_{1} = a_{11}x_1 + a_{12}x_2 + a_{13}x_3 \cdots+ a_{1n}x_n
b_{1} = w_kg_k(\cdots w_3g_3(w_2g_2(a_{11}x_1 + a_{12}x_2 + a_{13}x_3 \cdots, a_{1n}x_n)))
complex, uninterpretable, more predictive (accurate)
(psst! psst! Deep Learning)
\(f\) is linear
In this course
What's the connection to Linear Algebra
a_{11}x_1 + a_{12}x_2 + a_{13}x_3 \cdots+ a_{1n}x_n = b_{1}
a_{21}x_1 + a_{22}x_2 + a_{23}x_3 \cdots+ a_{2n}x_n = b_{2}
a_{31}x_1 + a_{32}x_2 + a_{33}x_3 \cdots+ a_{3n}x_n = b_{3}
a_{m1}x_1 + a_{m2}x_2 + a_{m3}x_3 \cdots+ a_{mn}x_n = b_{m}
\cdots
\cdots
\cdots
\cdots
\cdots
\cdots
\cdots
\cdots
Loc. 1
Loc. 2
Loc. 3
Loc. m
We are interested in finding the \(x\)'s which best explain the relationship between the \(a\)'s and the \(b\)'s
In other words .....
What's the connection to Linear Algebra
Loc. 1
Loc. 2
Loc. 3
Loc. m
We are interested in finding the \(x\)'s which best explain the relationship between the \(a\)'s and the \(b\)'s
In other words .....
..... we are interested in solving \( A \mathbf{x} = \mathbf{b} \)
\begin{bmatrix}
a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\
a_{21}&~&a_{22}&~&a_{23}&~&a_{24}&~&a_{25}&\cdots&a_{2n}\\
a_{31}&~&a_{32}&~&a_{33}&~&a_{34}&~&a_{35}&\cdots&a_{3n}\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\
\end{bmatrix}
=\begin{bmatrix}
b_{1}\\
b_{2}\\
b_{3}\\
\cdots\\
\cdots\\
b_{m}\\
\end{bmatrix}
\begin{bmatrix}
x_{1}\\
x_{2}\\
x_{3}\\
\cdots\\
\cdots\\
x_{n}\\
\end{bmatrix}
A
\mathbf{x}
\mathbf{b}
What could go wrong?
Loc. 1
Loc. 2
Loc. 3
Loc. m
0 solutions
\begin{bmatrix}
a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\
a_{21}&~&a_{22}&~&a_{23}&~&a_{24}&~&a_{25}&\cdots&a_{2n}\\
a_{31}&~&a_{32}&~&a_{33}&~&a_{34}&~&a_{35}&\cdots&a_{3n}\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\
\end{bmatrix}
=\begin{bmatrix}
b_{1}\\
b_{2}\\
b_{3}\\
\cdots\\
\cdots\\
b_{m}\\
\end{bmatrix}
\begin{bmatrix}
x_{1}\\
x_{2}\\
x_{3}\\
\cdots\\
\cdots\\
x_{n}\\
\end{bmatrix}
inconsistent equations
For example two patients with the same heart rate, blood sugar level, weight, height (inputs) but a different blood cholesterol level (output)
72x_1 + 84x_2 + 175x_3 \cdots+ 78x_n =110
72x_1 + 84x_2 + 175x_3 \cdots+ 78x_n =120
What can you do?
Loc. 1
Loc. 2
Loc. 3
Loc. m
0 solutions
\begin{bmatrix}
a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\
a_{21}&~&a_{22}&~&a_{23}&~&a_{24}&~&a_{25}&\cdots&a_{2n}\\
a_{31}&~&a_{32}&~&a_{33}&~&a_{34}&~&a_{35}&\cdots&a_{3n}\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\
\end{bmatrix}
=\begin{bmatrix}
b_{1}\\
b_{2}\\
b_{3}\\
\cdots\\
\cdots\\
b_{m}\\
\end{bmatrix}
\begin{bmatrix}
x_{1}\\
x_{2}\\
x_{3}\\
\cdots\\
\cdots\\
x_{n}\\
\end{bmatrix}
inconsistent equations
Find the best possible solution by tolerating some noise
72x_1 + 84x_2 + 175x_3 \cdots+ 78x_n =110
72x_1 + 84x_2 + 175x_3 \cdots+ 78x_n =120
115
115
What is the geometric view?
\begin{bmatrix}
a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\
72&~&84&~&175&~&\cdots&~&\cdots&\cdots&78\\
72&~&84&~&175&~&\cdots&~&\cdots&\cdots&78\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\
\end{bmatrix}
=\begin{bmatrix}
b_{1}\\
110\\
120\\
\cdots\\
\cdots\\
b_{m}\\
\end{bmatrix}
\begin{bmatrix}
x_{1}\\
x_{2}\\
x_{3}\\
\cdots\\
\cdots\\
x_{m}\\
\end{bmatrix}
A
\mathbf{b}
\mathbf{b}
column space of A
\mathbf{p}
\begin{bmatrix}
p_{1}\\
115\\
115\\
\cdots\\
\cdots\\
p_{m}\\
\end{bmatrix}
\mathbf{p}
"Project" \(\mathbf{b} \) into the column space of \(A\)
Solve \( A\mathbf{\hat{x}} = \mathbf{p}\)
\( \mathbf{\hat{x}}\) is the best possible approximation
\(\mathbf{b} \) is not in the column space of \(A\)
Hence no solution to \(A\mathbf{x} = \mathbf{b} \)
What next?
How do we project \(\mathbf{b} \) into the column space of \(A\)?
But first a detour to build some concepts ...
What are orthogonal vectors?
What is the norm of a vector?
What are orthogonal subspaces?
The norm of a vector
Given a vector space \(V\), the norm of a vector is a non-negative valued function \(p: V \rightarrow \mathbb{R} \) with the following properties
p(\mathbf{u} + \mathbf{v}) \leq p(\mathbf{u}) + p(\mathbf{v})
p(a\mathbf{u}) = |a| p(\mathbf{u})
If~p(\mathbf{u}) = 0~then~\mathbf{u}=\mathbf{0}
(triangular inequality)
\mathbf{u}
\mathbf{v}
\mathbf{u}+\mathbf{v}
Examples of norm
L_{1}~norm
||\mathbf{x}||_1 = |x_1| + |x_2| + |x_3| + \cdots + |x_n|
(\mathcal{l}_{1}~norm)
L_{2}~norm
(\mathcal{l}_{2}~norm)
(Euclidean~norm)
(taxicab~norm)
||\mathbf{x}||_2 = (|x_1|^2 + |x_2|^2 + |x_3|^2 + \cdots + |x_n|^2)^{\frac{1}{2}}
= \sqrt{x_1^2 + x_2^2 + x_3^2 + \cdots + x_n^2}
L_{p}~norm
(\mathcal{l}_{p}~norm)
||\mathbf{x}||_p = (|x_1|^p + |x_2|^p + |x_3|^p + \cdots + |x_n|^p)^{\frac{1}{p}}
= \sqrt{\mathbf{x}^\top\mathbf{x}}
[x_1~x_2~x_3~\cdots~x_n]
\begin{bmatrix}
x_1\\
x_2\\
x_3\\
\cdots\\
x_n
\end{bmatrix}
\mathbf{x}^\top
\mathbf{x}
(most commonly used)
= \sum_{i=1}^{n}(|x_i|^p)^{\frac{1}{p}}
Orthogonal vectors
Vectors \( \mathbf{u} \) and \( \mathbf{v} \) are orthogonal if the angle between them is \(90\degree \)
Condition for orthogonality
By Pythagoras' Theorem
(||\mathbf{u}+\mathbf{v}||_2)^2 = (||\mathbf{u}||_2)^2 + (||\mathbf{v}||_2)^2
\implies (\mathbf{u}+\mathbf{v})^\top(\mathbf{u}+\mathbf{v}) = \mathbf{u}^\top\mathbf{u} + \mathbf{v}^\top\mathbf{v}
\implies (\mathbf{u}^\top+\mathbf{v}^\top)(\mathbf{u}+\mathbf{v}) = \mathbf{u}^\top\mathbf{u} + \mathbf{v}^\top\mathbf{v}
\implies \mathbf{u}^\top\mathbf{u}+\mathbf{v}^\top\mathbf{u} + \mathbf{u}^\top\mathbf{v}+\mathbf{v}^\top\mathbf{v} = \mathbf{u}^\top\mathbf{u} + \mathbf{v}^\top\mathbf{v}
\implies \mathbf{u}^\top\mathbf{v} = 0
(the dot product of the two vectors will be 0)
\mathbf{u}
\mathbf{v}
\mathbf{u}+\mathbf{v}
Orthogonal subspaces
Two subspaces \( S_1 \) and \( S_2 \) are orthogonal if every vector \( \mathbf{u} \in S_1 \) is orthogonal to every vector in \( \mathbf{v} \in S_2 \)
(switch to geogebra)
Orthogonal subspaces
\mathcal{C}(A)
\mathcal{N}(A^\top)
inside~\mathbb{R}^m
\begin{bmatrix}
\leftarrow \mathbf{a_1}^\top \rightarrow\\
\leftarrow \mathbf{a_2}^\top \rightarrow\\
\leftarrow \mathbf{a_3}^\top \rightarrow\\
\end{bmatrix}
\begin{bmatrix}
\uparrow&\uparrow&\uparrow\\
\mathbf{a_1}&\mathbf{a_2}&\mathbf{a_3}\\
\downarrow&\downarrow&\downarrow\\
\end{bmatrix}
A
A^\top
\begin{bmatrix}
x_1\\
x_2\\
x_3
\end{bmatrix}
\mathbf{x}
= \begin{bmatrix}
0\\0\\0
\end{bmatrix}
\mathbf{x} \in \mathcal{N}(A^\top)
\implies \mathbf{a_1}^\top\mathbf{x} = 0
\mathbf{a_2}^\top\mathbf{x} = 0
\mathbf{a_3}^\top\mathbf{x} = 0
\implies (p\mathbf{a_1}+q\mathbf{a_2}+r\mathbf{a_3})^\top\mathbf{x} = 0
\implies \mathcal{C}(A) \perp \mathcal{N}(A^\top)
The 4 fundamental subspaces
\mathcal{C}(A)
\mathcal{N}(A^\top)
inside~\mathbb{R}^m
\mathcal{C}(A^\top)
\mathcal{N}(A)
inside~\mathbb{R}^n
dim=r
dim=n-r
dim=r
dim=m-r
\mathcal{C}(A) \perp \mathcal{N}(A^\top)
\mathcal{C}(A^\top) \perp \mathcal{N}(A)
orthogonal complements
orthogonal complements
Example: \(\mathcal{C}(A) \perp \mathcal{N}(A^\top)\)
\begin{bmatrix}
1&1&2\\
1&2&3\\
1&3&4\\
\end{bmatrix}
A
\begin{bmatrix}
1&1&2\\
0&1&1\\
0&0&0\\
\end{bmatrix}
U
Basis(\mathcal{C}(A))
=\begin{bmatrix}
1\\
1\\
1\\
\end{bmatrix}
,\begin{bmatrix}
1\\
2\\
3\\
\end{bmatrix}
Basis(\mathcal{N}(A^\top))
=\begin{bmatrix}
1\\
1\\
-1\\
\end{bmatrix}
\begin{bmatrix}
1&1&1\\
1&2&3\\
2&3&4\\
\end{bmatrix}
A^\top
\begin{bmatrix}
1&1&1\\
0&1&2\\
0&0&0\\
\end{bmatrix}
A^\top\mathbf{x} = \mathbf{0}
\mathbf{x}=\begin{bmatrix}
1\\
-2\\
1\\
\end{bmatrix}
special solution
(switch to geogebra)
Practice Problems
For each of the above matrices, verify whether the null space of \( A \) is orthogonal to the column space of \( A^\top \)
\begin{bmatrix}
1&2&3\\
1&1&2\\
2&1&0
\end{bmatrix}
\begin{bmatrix}
1&2&3&3&2&1\\
1&2&2&1&1&2\\
2&0&0&1&2&2
\end{bmatrix}
\begin{bmatrix}
0&2&2\\
1&2&1\\
2&1&3\\
3&1&1\\
-1&1&-2\\
2&1&0\\
\end{bmatrix}
\begin{bmatrix}
1&2&3&3&2&1\\
2&1&0&1&2&2\\
4&5&6&7&6&4
\end{bmatrix}
Learning Objectives
(achieved)
How do vectors and matrices show up in Machine Learning?
What are orthogonal vectors?
How do you compute the norm of a vector?
What do you do when Ax=b does not have a solution (intuition)?
What are orthogonal subspaces?
CS6015: Lecture 11
By Mitesh Khapra
CS6015: Lecture 11
Lecture 11: A tiny bit of ML, vector norms, orthogonal vectors, orthogonal subspaces
- 2,506