CS6015: Linear Algebra and Random Processes

Lecture 11: A tiny bit of ML, vector norms, orthogonal vectors, orthogonal subspaces

Please join on https://iitmadras.webex.com/join/miteshk

Learning Objectives

How do vectors and matrices show up in Machine Learning?

What are orthogonal vectors?

How do you compute the norm of a vector?

(for today's lecture)

What do you do when Ax=b does not have a solution (intuition)?

What are orthogonal subspaces?

An example from Machine Learning

Salinity

How much oil can be recovered from a drilling site?

Site 1

Site 2

Site 3

Site m

Pressure

Density

Depth

Temp.

n var.

...

(Rely on past data)

Quantity

\begin{bmatrix} a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\ a_{21}&~&a_{22}&~&a_{23}&~&a_{24}&~&a_{25}&\cdots&a_{2n}\\ a_{31}&~&a_{32}&~&a_{33}&~&a_{34}&~&a_{35}&\cdots&a_{3n}\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\ \end{bmatrix}

\begin{bmatrix} b_{1}\\ b_{2}\\ b_{3}\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}

You know that some relation exists between 'Quantity' and the n variables

But you don't know what \( f \) is!

b_{1} = f(a_{11}, a_{12}, a_{13}, \cdots, a_{1n})

So what do you do?

An example from Machine Learning

# of Degrees

How much salary should be offered to a candidate?

Emp. 1

Emp. 2

Emp. 3

Emp. m

# Exp.

# Projects

Univ. Rank

# Prog. Lang. known

n var.

...

(Rely on past employee data)

Salary

\begin{bmatrix} b_{1}\\ b_{2}\\ b_{3}\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}

You know that some relation exists between 'Salary' and the n variables

But you don't know what \( f \) is!

b_{1} = f(a_{11}, a_{12}, a_{13}, \cdots, a_{1n})

So what do you do?

An example from Machine Learning

Total Pop.

How many COVID19 cases in the next week in a locality?

Loc. 1

Loc. 2

Loc. 3

Loc. m

Pop. Density

Avg. Income

Last week's count

# masks sold

n var.

...

(Rely on past data)

Cases

\begin{bmatrix} b_{1}\\ b_{2}\\ b_{3}\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}

You know that some relation exists between 'Cases' and the n variables

But you don't know what \( f \) is!

b_{1} = f(a_{11}, a_{12}, a_{13}, \cdots, a_{1n})

So what do you do?

A typical Machine Learning setup

You know that some relation exists between the 'output' and the 'input' variables

But you don't know what \( f \) is!

b_{1} = f(a_{11}, a_{12}, a_{13}, \cdots, a_{1n})

So what do you do?

You make some assumption about \( f \)

What is the simplest assumption you can make?

simple, interpretable, less predictive (less accurate)

Can you make more complex assumptions?

Yes,

b_{1} = a_{11}x_1 + a_{12}x_2 + a_{13}x_3 \cdots+ a_{1n}x_n

b_{1} = w_kg_k(\cdots w_3g_3(w_2g_2(a_{11}x_1 + a_{12}x_2 + a_{13}x_3 \cdots, a_{1n}x_n)))

complex, uninterpretable, more predictive (accurate)

(psst! psst! Deep Learning)

\(f\) is linear

In this course

What's the connection to Linear Algebra

a_{11}x_1 + a_{12}x_2 + a_{13}x_3 \cdots+ a_{1n}x_n = b_{1}

a_{21}x_1 + a_{22}x_2 + a_{23}x_3 \cdots+ a_{2n}x_n = b_{2}

a_{31}x_1 + a_{32}x_2 + a_{33}x_3 \cdots+ a_{3n}x_n = b_{3}

a_{m1}x_1 + a_{m2}x_2 + a_{m3}x_3 \cdots+ a_{mn}x_n = b_{m}

\cdots

Loc. 1

Loc. 2

Loc. 3

Loc. m

We are interested in finding the \(x\)'s which best explain the relationship between the \(a\)'s and the \(b\)'s

In other words .....

What's the connection to Linear Algebra

Loc. 1

Loc. 2

Loc. 3

Loc. m

We are interested in finding the \(x\)'s which best explain the relationship between the \(a\)'s and the \(b\)'s

In other words .....

..... we are interested in solving \( A \mathbf{x} = \mathbf{b} \)

=\begin{bmatrix} b_{1}\\ b_{2}\\ b_{3}\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}

\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ \cdots\\ \cdots\\ x_{n}\\ \end{bmatrix}

\mathbf{x}

\mathbf{b}

What could go wrong?

Loc. 1

Loc. 2

Loc. 3

Loc. m

0 solutions

=\begin{bmatrix} b_{1}\\ b_{2}\\ b_{3}\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}

\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ \cdots\\ \cdots\\ x_{n}\\ \end{bmatrix}

inconsistent equations

For example two patients with the same heart rate, blood sugar level, weight, height (inputs) but a different blood cholesterol level (output)

72x_1 + 84x_2 + 175x_3 \cdots+ 78x_n =110

72x_1 + 84x_2 + 175x_3 \cdots+ 78x_n =120

What can you do?

Loc. 1

Loc. 2

Loc. 3

Loc. m

0 solutions

=\begin{bmatrix} b_{1}\\ b_{2}\\ b_{3}\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}

\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ \cdots\\ \cdots\\ x_{n}\\ \end{bmatrix}

inconsistent equations

Find the best possible solution by tolerating some noise

72x_1 + 84x_2 + 175x_3 \cdots+ 78x_n =110

72x_1 + 84x_2 + 175x_3 \cdots+ 78x_n =120

115

What is the geometric view?

\begin{bmatrix} a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\ 72&~&84&~&175&~&\cdots&~&\cdots&\cdots&78\\ 72&~&84&~&175&~&\cdots&~&\cdots&\cdots&78\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\ \end{bmatrix}

=\begin{bmatrix} b_{1}\\ 110\\ 120\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}

\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ \cdots\\ \cdots\\ x_{m}\\ \end{bmatrix}

\mathbf{b}

column space of A

\mathbf{p}

\begin{bmatrix} p_{1}\\ 115\\ 115\\ \cdots\\ \cdots\\ p_{m}\\ \end{bmatrix}

\mathbf{p}

"Project" \(\mathbf{b} \) into the column space of \(A\)

Solve \( A\mathbf{\hat{x}} = \mathbf{p}\)

\( \mathbf{\hat{x}}\) is the best possible approximation

\(\mathbf{b} \) is not in the column space of \(A\)

Hence no solution to \(A\mathbf{x} = \mathbf{b} \)

What next?

How do we project \(\mathbf{b} \) into the column space of \(A\)?

But first a detour to build some concepts ...

What are orthogonal vectors?

What is the norm of a vector?

What are orthogonal subspaces?

The norm of a vector

Given a vector space \(V\), the norm of a vector is a non-negative valued function \(p: V \rightarrow \mathbb{R} \) with the following properties

p(\mathbf{u} + \mathbf{v}) \leq p(\mathbf{u}) + p(\mathbf{v})

p(a\mathbf{u}) = |a| p(\mathbf{u})

If~p(\mathbf{u}) = 0~then~\mathbf{u}=\mathbf{0}

(triangular inequality)

\mathbf{u}

\mathbf{v}

\mathbf{u}+\mathbf{v}

Examples of norm

L_{1}~norm

||\mathbf{x}||_1 = |x_1| + |x_2| + |x_3| + \cdots + |x_n|

(\mathcal{l}_{1}~norm)

L_{2}~norm

(\mathcal{l}_{2}~norm)

(Euclidean~norm)

(taxicab~norm)

||\mathbf{x}||_2 = (|x_1|^2 + |x_2|^2 + |x_3|^2 + \cdots + |x_n|^2)^{\frac{1}{2}}

= \sqrt{x_1^2 + x_2^2 + x_3^2 + \cdots + x_n^2}

L_{p}~norm

(\mathcal{l}_{p}~norm)

||\mathbf{x}||_p = (|x_1|^p + |x_2|^p + |x_3|^p + \cdots + |x_n|^p)^{\frac{1}{p}}

= \sqrt{\mathbf{x}^\top\mathbf{x}}

[x_1~x_2~x_3~\cdots~x_n]

\begin{bmatrix} x_1\\ x_2\\ x_3\\ \cdots\\ x_n \end{bmatrix}

\mathbf{x}^\top

\mathbf{x}

(most commonly used)

= \sum_{i=1}^{n}(|x_i|^p)^{\frac{1}{p}}

Orthogonal vectors

Vectors \( \mathbf{u} \) and \( \mathbf{v} \) are orthogonal if the angle between them is \(90\degree \)

Condition for orthogonality

By Pythagoras' Theorem

(||\mathbf{u}+\mathbf{v}||_2)^2 = (||\mathbf{u}||_2)^2 + (||\mathbf{v}||_2)^2

\implies (\mathbf{u}+\mathbf{v})^\top(\mathbf{u}+\mathbf{v}) = \mathbf{u}^\top\mathbf{u} + \mathbf{v}^\top\mathbf{v}

\implies (\mathbf{u}^\top+\mathbf{v}^\top)(\mathbf{u}+\mathbf{v}) = \mathbf{u}^\top\mathbf{u} + \mathbf{v}^\top\mathbf{v}

\implies \mathbf{u}^\top\mathbf{u}+\mathbf{v}^\top\mathbf{u} + \mathbf{u}^\top\mathbf{v}+\mathbf{v}^\top\mathbf{v} = \mathbf{u}^\top\mathbf{u} + \mathbf{v}^\top\mathbf{v}

\implies \mathbf{u}^\top\mathbf{v} = 0

(the dot product of the two vectors will be 0)

\mathbf{u}

\mathbf{v}

\mathbf{u}+\mathbf{v}

Orthogonal subspaces

Two subspaces \( S_1 \) and \( S_2 \) are orthogonal if every vector \( \mathbf{u} \in S_1 \) is orthogonal to every vector in \( \mathbf{v} \in S_2 \)

(switch to geogebra)

Orthogonal subspaces

\mathcal{C}(A)

\mathcal{N}(A^\top)

inside~\mathbb{R}^m

\begin{bmatrix} \leftarrow \mathbf{a_1}^\top \rightarrow\\ \leftarrow \mathbf{a_2}^\top \rightarrow\\ \leftarrow \mathbf{a_3}^\top \rightarrow\\ \end{bmatrix}

\begin{bmatrix} \uparrow&\uparrow&\uparrow\\ \mathbf{a_1}&\mathbf{a_2}&\mathbf{a_3}\\ \downarrow&\downarrow&\downarrow\\ \end{bmatrix}

A^\top

\begin{bmatrix} x_1\\ x_2\\ x_3 \end{bmatrix}

\mathbf{x}

= \begin{bmatrix} 0\\0\\0 \end{bmatrix}

\mathbf{x} \in \mathcal{N}(A^\top)

\implies \mathbf{a_1}^\top\mathbf{x} = 0

\mathbf{a_2}^\top\mathbf{x} = 0

\mathbf{a_3}^\top\mathbf{x} = 0

\implies (p\mathbf{a_1}+q\mathbf{a_2}+r\mathbf{a_3})^\top\mathbf{x} = 0

\implies \mathcal{C}(A) \perp \mathcal{N}(A^\top)

The 4 fundamental subspaces

\mathcal{C}(A)

\mathcal{N}(A^\top)

inside~\mathbb{R}^m

\mathcal{C}(A^\top)

\mathcal{N}(A)

inside~\mathbb{R}^n

dim=r

dim=n-r

dim=r

dim=m-r

\mathcal{C}(A) \perp \mathcal{N}(A^\top)

\mathcal{C}(A^\top) \perp \mathcal{N}(A)

orthogonal complements

Example: \(\mathcal{C}(A) \perp \mathcal{N}(A^\top)\)

\begin{bmatrix} 1&1&2\\ 1&2&3\\ 1&3&4\\ \end{bmatrix}

\begin{bmatrix} 1&1&2\\ 0&1&1\\ 0&0&0\\ \end{bmatrix}

Basis(\mathcal{C}(A))

=\begin{bmatrix} 1\\ 1\\ 1\\ \end{bmatrix}

,\begin{bmatrix} 1\\ 2\\ 3\\ \end{bmatrix}

Basis(\mathcal{N}(A^\top))

=\begin{bmatrix} 1\\ 1\\ -1\\ \end{bmatrix}

\begin{bmatrix} 1&1&1\\ 1&2&3\\ 2&3&4\\ \end{bmatrix}

A^\top

\begin{bmatrix} 1&1&1\\ 0&1&2\\ 0&0&0\\ \end{bmatrix}

A^\top\mathbf{x} = \mathbf{0}

\mathbf{x}=\begin{bmatrix} 1\\ -2\\ 1\\ \end{bmatrix}