CS6015: Linear Algebra and Random Processes

Lecture 11: A tiny bit of ML, vector norms, orthogonal vectors, orthogonal subspaces

Learning Objectives

How do vectors and matrices show up in Machine Learning?

What are orthogonal vectors?

How do you compute the norm of a vector? 

(for today's lecture)

What do you do when Ax=b does not have a solution (intuition)?

What are orthogonal subspaces?

An example from Machine Learning

Salinity

How much oil can be recovered from a drilling site?

Site 1
Site 2
Site 3
Site m
Pressure
Density
Depth
Temp.
n var.
...
(Rely on past data)
Quantity
\begin{bmatrix} a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\ a_{21}&~&a_{22}&~&a_{23}&~&a_{24}&~&a_{25}&\cdots&a_{2n}\\ a_{31}&~&a_{32}&~&a_{33}&~&a_{34}&~&a_{35}&\cdots&a_{3n}\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\ \end{bmatrix}
\begin{bmatrix} b_{1}\\ b_{2}\\ b_{3}\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}

You know that some relation exists between 'Quantity' and the n variables

But you don't know what \( f \) is!

b_{1} = f(a_{11}, a_{12}, a_{13}, \cdots, a_{1n})

So what do you do?

An example from Machine Learning

# of Degrees

How much salary should be offered to a candidate?

Emp. 1
Emp. 2
Emp. 3
Emp. m
# Exp.
# Projects
Univ. Rank
# Prog. Lang. known
n var.
...
(Rely on past employee data)
Salary
\begin{bmatrix} a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\ a_{21}&~&a_{22}&~&a_{23}&~&a_{24}&~&a_{25}&\cdots&a_{2n}\\ a_{31}&~&a_{32}&~&a_{33}&~&a_{34}&~&a_{35}&\cdots&a_{3n}\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\ \end{bmatrix}
\begin{bmatrix} b_{1}\\ b_{2}\\ b_{3}\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}

You know that some relation exists between 'Salary' and the n variables

But you don't know what \( f \) is!

b_{1} = f(a_{11}, a_{12}, a_{13}, \cdots, a_{1n})

So what do you do?

An example from Machine Learning

Total Pop.

How many COVID19 cases in the next week in a locality?

Loc. 1
Loc. 2
Loc. 3
Loc. m
Pop. Density
Avg. Income
Last week's count
# masks sold
n var.
...
(Rely on past data)
Cases
\begin{bmatrix} a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\ a_{21}&~&a_{22}&~&a_{23}&~&a_{24}&~&a_{25}&\cdots&a_{2n}\\ a_{31}&~&a_{32}&~&a_{33}&~&a_{34}&~&a_{35}&\cdots&a_{3n}\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\ \end{bmatrix}
\begin{bmatrix} b_{1}\\ b_{2}\\ b_{3}\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}

You know that some relation exists between 'Cases' and the n variables

But you don't know what \( f \) is!

b_{1} = f(a_{11}, a_{12}, a_{13}, \cdots, a_{1n})

So what do you do?

A typical Machine Learning setup

You know that some relation exists between the 'output' and the 'input' variables

But you don't know what \( f \) is!

b_{1} = f(a_{11}, a_{12}, a_{13}, \cdots, a_{1n})

So what do you do?

You make some assumption about \( f \)

What is the simplest assumption you can make?

simple, interpretable, less predictive (less accurate)

Can you make more complex assumptions?

Yes,

b_{1} = a_{11}x_1 + a_{12}x_2 + a_{13}x_3 \cdots+ a_{1n}x_n
b_{1} = w_kg_k(\cdots w_3g_3(w_2g_2(a_{11}x_1 + a_{12}x_2 + a_{13}x_3 \cdots, a_{1n}x_n)))

complex, uninterpretable, more predictive (accurate)

(psst! psst! Deep Learning)

\(f\) is linear

In this course

What's the connection to Linear Algebra

a_{11}x_1 + a_{12}x_2 + a_{13}x_3 \cdots+ a_{1n}x_n = b_{1}
a_{21}x_1 + a_{22}x_2 + a_{23}x_3 \cdots+ a_{2n}x_n = b_{2}
a_{31}x_1 + a_{32}x_2 + a_{33}x_3 \cdots+ a_{3n}x_n = b_{3}
a_{m1}x_1 + a_{m2}x_2 + a_{m3}x_3 \cdots+ a_{mn}x_n = b_{m}
\cdots
\cdots
\cdots
\cdots
\cdots
\cdots
\cdots
\cdots
Loc. 1
Loc. 2
Loc. 3
Loc. m

We are interested in finding the \(x\)'s which best explain the relationship between the \(a\)'s and the \(b\)'s

In other words .....

What's the connection to Linear Algebra

Loc. 1
Loc. 2
Loc. 3
Loc. m

We are interested in finding the \(x\)'s which best explain the relationship between the \(a\)'s and the \(b\)'s

In other words .....

..... we are interested in solving \( A \mathbf{x} = \mathbf{b} \)

\begin{bmatrix} a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\ a_{21}&~&a_{22}&~&a_{23}&~&a_{24}&~&a_{25}&\cdots&a_{2n}\\ a_{31}&~&a_{32}&~&a_{33}&~&a_{34}&~&a_{35}&\cdots&a_{3n}\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\ \end{bmatrix}
=\begin{bmatrix} b_{1}\\ b_{2}\\ b_{3}\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}
\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ \cdots\\ \cdots\\ x_{n}\\ \end{bmatrix}
A
\mathbf{x}
\mathbf{b}

What could go wrong?

Loc. 1
Loc. 2
Loc. 3
Loc. m

0 solutions

\begin{bmatrix} a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\ a_{21}&~&a_{22}&~&a_{23}&~&a_{24}&~&a_{25}&\cdots&a_{2n}\\ a_{31}&~&a_{32}&~&a_{33}&~&a_{34}&~&a_{35}&\cdots&a_{3n}\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\ \end{bmatrix}
=\begin{bmatrix} b_{1}\\ b_{2}\\ b_{3}\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}
\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ \cdots\\ \cdots\\ x_{n}\\ \end{bmatrix}
inconsistent equations

For example two patients with the same heart rate, blood sugar level, weight, height (inputs) but a different blood cholesterol level (output) 

72x_1 + 84x_2 + 175x_3 \cdots+ 78x_n =110
72x_1 + 84x_2 + 175x_3 \cdots+ 78x_n =120

What can you do?

Loc. 1
Loc. 2
Loc. 3
Loc. m

0 solutions

\begin{bmatrix} a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\ a_{21}&~&a_{22}&~&a_{23}&~&a_{24}&~&a_{25}&\cdots&a_{2n}\\ a_{31}&~&a_{32}&~&a_{33}&~&a_{34}&~&a_{35}&\cdots&a_{3n}\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\ \end{bmatrix}
=\begin{bmatrix} b_{1}\\ b_{2}\\ b_{3}\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}
\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ \cdots\\ \cdots\\ x_{n}\\ \end{bmatrix}
inconsistent equations

Find the best possible solution by  tolerating some noise

72x_1 + 84x_2 + 175x_3 \cdots+ 78x_n =110
72x_1 + 84x_2 + 175x_3 \cdots+ 78x_n =120
115
115

What is the geometric view?

\begin{bmatrix} a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\ 72&~&84&~&175&~&\cdots&~&\cdots&\cdots&78\\ 72&~&84&~&175&~&\cdots&~&\cdots&\cdots&78\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\ \end{bmatrix}
=\begin{bmatrix} b_{1}\\ 110\\ 120\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}
\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ \cdots\\ \cdots\\ x_{m}\\ \end{bmatrix}
A
\mathbf{b}
\mathbf{b}
column space of A
\mathbf{p}
\begin{bmatrix} p_{1}\\ 115\\ 115\\ \cdots\\ \cdots\\ p_{m}\\ \end{bmatrix}
\mathbf{p}

"Project" \(\mathbf{b} \) into the column space of \(A\)

Solve \( A\mathbf{\hat{x}} = \mathbf{p}\)

\( \mathbf{\hat{x}}\) is the best possible approximation 

\(\mathbf{b} \) is not in the column space of \(A\)

Hence no solution to \(A\mathbf{x} = \mathbf{b} \) 

What next?

How do we project \(\mathbf{b} \) into the column space of \(A\)?

But first a detour to build some concepts ...

What are orthogonal vectors?

What is the norm of a vector?

What are orthogonal subspaces?

The norm of a vector

Given a vector space \(V\), the norm of a vector is a non-negative valued function \(p: V \rightarrow \mathbb{R} \) with the following properties

p(\mathbf{u} + \mathbf{v}) \leq p(\mathbf{u}) + p(\mathbf{v})
p(a\mathbf{u}) = |a| p(\mathbf{u})
If~p(\mathbf{u}) = 0~then~\mathbf{u}=\mathbf{0}

(triangular inequality)

\mathbf{u}
\mathbf{v}
\mathbf{u}+\mathbf{v}

Examples of norm

L_{1}~norm
||\mathbf{x}||_1 = |x_1| + |x_2| + |x_3| + \cdots + |x_n|
(\mathcal{l}_{1}~norm)
L_{2}~norm
(\mathcal{l}_{2}~norm)
(Euclidean~norm)
(taxicab~norm)
||\mathbf{x}||_2 = (|x_1|^2 + |x_2|^2 + |x_3|^2 + \cdots + |x_n|^2)^{\frac{1}{2}}
= \sqrt{x_1^2 + x_2^2 + x_3^2 + \cdots + x_n^2}
L_{p}~norm
(\mathcal{l}_{p}~norm)
||\mathbf{x}||_p = (|x_1|^p + |x_2|^p + |x_3|^p + \cdots + |x_n|^p)^{\frac{1}{p}}
= \sqrt{\mathbf{x}^\top\mathbf{x}}
[x_1~x_2~x_3~\cdots~x_n]
\begin{bmatrix} x_1\\ x_2\\ x_3\\ \cdots\\ x_n \end{bmatrix}
\mathbf{x}^\top
\mathbf{x}

(most commonly used)

= \sum_{i=1}^{n}(|x_i|^p)^{\frac{1}{p}}

Orthogonal vectors

Vectors \( \mathbf{u} \) and \( \mathbf{v} \) are orthogonal if the angle between them is \(90\degree \) 

Condition for orthogonality

By Pythagoras' Theorem

(||\mathbf{u}+\mathbf{v}||_2)^2 = (||\mathbf{u}||_2)^2 + (||\mathbf{v}||_2)^2
\implies (\mathbf{u}+\mathbf{v})^\top(\mathbf{u}+\mathbf{v}) = \mathbf{u}^\top\mathbf{u} + \mathbf{v}^\top\mathbf{v}
\implies (\mathbf{u}^\top+\mathbf{v}^\top)(\mathbf{u}+\mathbf{v}) = \mathbf{u}^\top\mathbf{u} + \mathbf{v}^\top\mathbf{v}
\implies \mathbf{u}^\top\mathbf{u}+\mathbf{v}^\top\mathbf{u} + \mathbf{u}^\top\mathbf{v}+\mathbf{v}^\top\mathbf{v} = \mathbf{u}^\top\mathbf{u} + \mathbf{v}^\top\mathbf{v}
\implies \mathbf{u}^\top\mathbf{v} = 0

(the dot product of the two vectors will be 0)

\mathbf{u}
\mathbf{v}
\mathbf{u}+\mathbf{v}

Orthogonal subspaces

Two subspaces \( S_1 \) and \( S_2 \) are orthogonal if every vector \( \mathbf{u} \in S_1 \)  is orthogonal to every vector in \( \mathbf{v} \in S_2 \) 

(switch to geogebra)

Orthogonal subspaces

\mathcal{C}(A)
\mathcal{N}(A^\top)
inside~\mathbb{R}^m
\begin{bmatrix} \leftarrow \mathbf{a_1}^\top \rightarrow\\ \leftarrow \mathbf{a_2}^\top \rightarrow\\ \leftarrow \mathbf{a_3}^\top \rightarrow\\ \end{bmatrix}
\begin{bmatrix} \uparrow&\uparrow&\uparrow\\ \mathbf{a_1}&\mathbf{a_2}&\mathbf{a_3}\\ \downarrow&\downarrow&\downarrow\\ \end{bmatrix}
A
A^\top
\begin{bmatrix} x_1\\ x_2\\ x_3 \end{bmatrix}
\mathbf{x}
= \begin{bmatrix} 0\\0\\0 \end{bmatrix}
\mathbf{x} \in \mathcal{N}(A^\top)
\implies \mathbf{a_1}^\top\mathbf{x} = 0
\mathbf{a_2}^\top\mathbf{x} = 0
\mathbf{a_3}^\top\mathbf{x} = 0
\implies (p\mathbf{a_1}+q\mathbf{a_2}+r\mathbf{a_3})^\top\mathbf{x} = 0
\implies \mathcal{C}(A) \perp \mathcal{N}(A^\top)

The 4 fundamental subspaces

\mathcal{C}(A)
\mathcal{N}(A^\top)
inside~\mathbb{R}^m
\mathcal{C}(A^\top)
\mathcal{N}(A)
inside~\mathbb{R}^n
dim=r
dim=n-r
dim=r
dim=m-r
\mathcal{C}(A) \perp \mathcal{N}(A^\top)
\mathcal{C}(A^\top) \perp \mathcal{N}(A)

orthogonal complements

orthogonal complements

Example: \(\mathcal{C}(A) \perp \mathcal{N}(A^\top)\)

\begin{bmatrix} 1&1&2\\ 1&2&3\\ 1&3&4\\ \end{bmatrix}
A
\begin{bmatrix} 1&1&2\\ 0&1&1\\ 0&0&0\\ \end{bmatrix}
U
Basis(\mathcal{C}(A))
=\begin{bmatrix} 1\\ 1\\ 1\\ \end{bmatrix}
,\begin{bmatrix} 1\\ 2\\ 3\\ \end{bmatrix}
Basis(\mathcal{N}(A^\top))
=\begin{bmatrix} 1\\ 1\\ -1\\ \end{bmatrix}
\begin{bmatrix} 1&1&1\\ 1&2&3\\ 2&3&4\\ \end{bmatrix}
A^\top
\begin{bmatrix} 1&1&1\\ 0&1&2\\ 0&0&0\\ \end{bmatrix}
A^\top\mathbf{x} = \mathbf{0}
\mathbf{x}=\begin{bmatrix} 1\\ -2\\ 1\\ \end{bmatrix}

special solution

(switch to geogebra)

Practice Problems

For each of the above matrices, verify whether the null space of \( A \) is orthogonal to the column space of \( A^\top \)

\begin{bmatrix} 1&2&3\\ 1&1&2\\ 2&1&0 \end{bmatrix}
\begin{bmatrix} 1&2&3&3&2&1\\ 1&2&2&1&1&2\\ 2&0&0&1&2&2 \end{bmatrix}
\begin{bmatrix} 0&2&2\\ 1&2&1\\ 2&1&3\\ 3&1&1\\ -1&1&-2\\ 2&1&0\\ \end{bmatrix}
\begin{bmatrix} 1&2&3&3&2&1\\ 2&1&0&1&2&2\\ 4&5&6&7&6&4 \end{bmatrix}

Learning Objectives

(achieved)

How do vectors and matrices show up in Machine Learning?

What are orthogonal vectors?

How do you compute the norm of a vector? 

What do you do when Ax=b does not have a solution (intuition)?

What are orthogonal subspaces?