# An example from Machine Learning

Salinity

### How much oil can be recovered from a drilling site?

Site 1
Site 2
Site 3
Site m
Pressure
Density
Depth
Temp.
n var.
...
(Rely on past data)
Quantity
\begin{bmatrix} a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\ a_{21}&~&a_{22}&~&a_{23}&~&a_{24}&~&a_{25}&\cdots&a_{2n}\\ a_{31}&~&a_{32}&~&a_{33}&~&a_{34}&~&a_{35}&\cdots&a_{3n}\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\ \end{bmatrix}
\begin{bmatrix} b_{1}\\ b_{2}\\ b_{3}\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}

### But you don't know what $$f$$ is!

b_{1} = f(a_{11}, a_{12}, a_{13}, \cdots, a_{1n})

# An example from Machine Learning

# of Degrees

### How much salary should be offered to a candidate?

Emp. 1
Emp. 2
Emp. 3
Emp. m
# Exp.
# Projects
Univ. Rank
# Prog. Lang. known
n var.
...
(Rely on past employee data)
Salary
\begin{bmatrix} a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\ a_{21}&~&a_{22}&~&a_{23}&~&a_{24}&~&a_{25}&\cdots&a_{2n}\\ a_{31}&~&a_{32}&~&a_{33}&~&a_{34}&~&a_{35}&\cdots&a_{3n}\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\ \end{bmatrix}
\begin{bmatrix} b_{1}\\ b_{2}\\ b_{3}\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}

### But you don't know what $$f$$ is!

b_{1} = f(a_{11}, a_{12}, a_{13}, \cdots, a_{1n})

# An example from Machine Learning

Total Pop.

### How many COVID19 cases in the next week in a locality?

Loc. 1
Loc. 2
Loc. 3
Loc. m
Pop. Density
Avg. Income
Last week's count
# masks sold
n var.
...
(Rely on past data)
Cases
\begin{bmatrix} a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\ a_{21}&~&a_{22}&~&a_{23}&~&a_{24}&~&a_{25}&\cdots&a_{2n}\\ a_{31}&~&a_{32}&~&a_{33}&~&a_{34}&~&a_{35}&\cdots&a_{3n}\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\ \end{bmatrix}
\begin{bmatrix} b_{1}\\ b_{2}\\ b_{3}\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}

### But you don't know what $$f$$ is!

b_{1} = f(a_{11}, a_{12}, a_{13}, \cdots, a_{1n})

# A typical Machine Learning setup

### But you don't know what $$f$$ is!

b_{1} = f(a_{11}, a_{12}, a_{13}, \cdots, a_{1n})

### Yes,

b_{1} = a_{11}x_1 + a_{12}x_2 + a_{13}x_3 \cdots+ a_{1n}x_n
b_{1} = w_kg_k(\cdots w_3g_3(w_2g_2(a_{11}x_1 + a_{12}x_2 + a_{13}x_3 \cdots, a_{1n}x_n)))

# What's the connection to Linear Algebra

a_{11}x_1 + a_{12}x_2 + a_{13}x_3 \cdots+ a_{1n}x_n = b_{1}
a_{21}x_1 + a_{22}x_2 + a_{23}x_3 \cdots+ a_{2n}x_n = b_{2}
a_{31}x_1 + a_{32}x_2 + a_{33}x_3 \cdots+ a_{3n}x_n = b_{3}
a_{m1}x_1 + a_{m2}x_2 + a_{m3}x_3 \cdots+ a_{mn}x_n = b_{m}
\cdots
\cdots
\cdots
\cdots
\cdots
\cdots
\cdots
\cdots
Loc. 1
Loc. 2
Loc. 3
Loc. m

# What's the connection to Linear Algebra

Loc. 1
Loc. 2
Loc. 3
Loc. m

### ..... we are interested in solving $$A \mathbf{x} = \mathbf{b}$$

\begin{bmatrix} a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\ a_{21}&~&a_{22}&~&a_{23}&~&a_{24}&~&a_{25}&\cdots&a_{2n}\\ a_{31}&~&a_{32}&~&a_{33}&~&a_{34}&~&a_{35}&\cdots&a_{3n}\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\ \end{bmatrix}
=\begin{bmatrix} b_{1}\\ b_{2}\\ b_{3}\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}
\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ \cdots\\ \cdots\\ x_{n}\\ \end{bmatrix}
A
\mathbf{x}
\mathbf{b}

# What could go wrong?

Loc. 1
Loc. 2
Loc. 3
Loc. m

### 0 solutions

\begin{bmatrix} a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\ a_{21}&~&a_{22}&~&a_{23}&~&a_{24}&~&a_{25}&\cdots&a_{2n}\\ a_{31}&~&a_{32}&~&a_{33}&~&a_{34}&~&a_{35}&\cdots&a_{3n}\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\ \end{bmatrix}
=\begin{bmatrix} b_{1}\\ b_{2}\\ b_{3}\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}
\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ \cdots\\ \cdots\\ x_{n}\\ \end{bmatrix}
inconsistent equations

### For example two patients with the same heart rate, blood sugar level, weight, height (inputs) but a different blood cholesterol level (output)

72x_1 + 84x_2 + 175x_3 \cdots+ 78x_n =110
72x_1 + 84x_2 + 175x_3 \cdots+ 78x_n =120

# What can you do?

Loc. 1
Loc. 2
Loc. 3
Loc. m

### 0 solutions

\begin{bmatrix} a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\ a_{21}&~&a_{22}&~&a_{23}&~&a_{24}&~&a_{25}&\cdots&a_{2n}\\ a_{31}&~&a_{32}&~&a_{33}&~&a_{34}&~&a_{35}&\cdots&a_{3n}\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\ \end{bmatrix}
=\begin{bmatrix} b_{1}\\ b_{2}\\ b_{3}\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}
\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ \cdots\\ \cdots\\ x_{n}\\ \end{bmatrix}
inconsistent equations

### Find the best possible solution by  tolerating some noise

72x_1 + 84x_2 + 175x_3 \cdots+ 78x_n =110
72x_1 + 84x_2 + 175x_3 \cdots+ 78x_n =120
115
115

# What is the geometric view?

\begin{bmatrix} a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\ 72&~&84&~&175&~&\cdots&~&\cdots&\cdots&78\\ 72&~&84&~&175&~&\cdots&~&\cdots&\cdots&78\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\ \end{bmatrix}
=\begin{bmatrix} b_{1}\\ 110\\ 120\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}
\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ \cdots\\ \cdots\\ x_{m}\\ \end{bmatrix}
A
\mathbf{b}
\mathbf{b}
column space of A
\mathbf{p}
\begin{bmatrix} p_{1}\\ 115\\ 115\\ \cdots\\ \cdots\\ p_{m}\\ \end{bmatrix}
\mathbf{p}

# The norm of a vector

### Given a vector space $$V$$, the norm of a vector is a non-negative valued function $$p: V \rightarrow \mathbb{R}$$ with the following properties

p(\mathbf{u} + \mathbf{v}) \leq p(\mathbf{u}) + p(\mathbf{v})
p(a\mathbf{u}) = |a| p(\mathbf{u})
If~p(\mathbf{u}) = 0~then~\mathbf{u}=\mathbf{0}

### (triangular inequality)

\mathbf{u}
\mathbf{v}
\mathbf{u}+\mathbf{v}

# Examples of norm

L_{1}~norm
||\mathbf{x}||_1 = |x_1| + |x_2| + |x_3| + \cdots + |x_n|
(\mathcal{l}_{1}~norm)
L_{2}~norm
(\mathcal{l}_{2}~norm)
(Euclidean~norm)
(taxicab~norm)
||\mathbf{x}||_2 = (|x_1|^2 + |x_2|^2 + |x_3|^2 + \cdots + |x_n|^2)^{\frac{1}{2}}
= \sqrt{x_1^2 + x_2^2 + x_3^2 + \cdots + x_n^2}
L_{p}~norm
(\mathcal{l}_{p}~norm)
||\mathbf{x}||_p = (|x_1|^p + |x_2|^p + |x_3|^p + \cdots + |x_n|^p)^{\frac{1}{p}}
= \sqrt{\mathbf{x}^\top\mathbf{x}}
[x_1~x_2~x_3~\cdots~x_n]
\begin{bmatrix} x_1\\ x_2\\ x_3\\ \cdots\\ x_n \end{bmatrix}
\mathbf{x}^\top
\mathbf{x}

### (most commonly used)

= \sum_{i=1}^{n}(|x_i|^p)^{\frac{1}{p}}

# Orthogonal vectors

### By Pythagoras' Theorem

(||\mathbf{u}+\mathbf{v}||_2)^2 = (||\mathbf{u}||_2)^2 + (||\mathbf{v}||_2)^2
\implies (\mathbf{u}+\mathbf{v})^\top(\mathbf{u}+\mathbf{v}) = \mathbf{u}^\top\mathbf{u} + \mathbf{v}^\top\mathbf{v}
\implies (\mathbf{u}^\top+\mathbf{v}^\top)(\mathbf{u}+\mathbf{v}) = \mathbf{u}^\top\mathbf{u} + \mathbf{v}^\top\mathbf{v}
\implies \mathbf{u}^\top\mathbf{u}+\mathbf{v}^\top\mathbf{u} + \mathbf{u}^\top\mathbf{v}+\mathbf{v}^\top\mathbf{v} = \mathbf{u}^\top\mathbf{u} + \mathbf{v}^\top\mathbf{v}
\implies \mathbf{u}^\top\mathbf{v} = 0

### (the dot product of the two vectors will be 0)

\mathbf{u}
\mathbf{v}
\mathbf{u}+\mathbf{v}

# Orthogonal subspaces

\mathcal{C}(A)
\mathcal{N}(A^\top)
inside~\mathbb{R}^m
\begin{bmatrix} \leftarrow \mathbf{a_1}^\top \rightarrow\\ \leftarrow \mathbf{a_2}^\top \rightarrow\\ \leftarrow \mathbf{a_3}^\top \rightarrow\\ \end{bmatrix}
\begin{bmatrix} \uparrow&\uparrow&\uparrow\\ \mathbf{a_1}&\mathbf{a_2}&\mathbf{a_3}\\ \downarrow&\downarrow&\downarrow\\ \end{bmatrix}
A
A^\top
\begin{bmatrix} x_1\\ x_2\\ x_3 \end{bmatrix}
\mathbf{x}
= \begin{bmatrix} 0\\0\\0 \end{bmatrix}
\mathbf{x} \in \mathcal{N}(A^\top)
\implies \mathbf{a_1}^\top\mathbf{x} = 0
\mathbf{a_2}^\top\mathbf{x} = 0
\mathbf{a_3}^\top\mathbf{x} = 0
\implies (p\mathbf{a_1}+q\mathbf{a_2}+r\mathbf{a_3})^\top\mathbf{x} = 0
\implies \mathcal{C}(A) \perp \mathcal{N}(A^\top)

# The 4 fundamental subspaces

\mathcal{C}(A)
\mathcal{N}(A^\top)
inside~\mathbb{R}^m
\mathcal{C}(A^\top)
\mathcal{N}(A)
inside~\mathbb{R}^n
dim=r
dim=n-r
dim=r
dim=m-r
\mathcal{C}(A) \perp \mathcal{N}(A^\top)
\mathcal{C}(A^\top) \perp \mathcal{N}(A)

# Example: $$\mathcal{C}(A) \perp \mathcal{N}(A^\top)$$

\begin{bmatrix} 1&1&2\\ 1&2&3\\ 1&3&4\\ \end{bmatrix}
A
\begin{bmatrix} 1&1&2\\ 0&1&1\\ 0&0&0\\ \end{bmatrix}
U
Basis(\mathcal{C}(A))
=\begin{bmatrix} 1\\ 1\\ 1\\ \end{bmatrix}
,\begin{bmatrix} 1\\ 2\\ 3\\ \end{bmatrix}
Basis(\mathcal{N}(A^\top))
=\begin{bmatrix} 1\\ 1\\ -1\\ \end{bmatrix}
\begin{bmatrix} 1&1&1\\ 1&2&3\\ 2&3&4\\ \end{bmatrix}
A^\top
\begin{bmatrix} 1&1&1\\ 0&1&2\\ 0&0&0\\ \end{bmatrix}
A^\top\mathbf{x} = \mathbf{0}
\mathbf{x}=\begin{bmatrix} 1\\ -2\\ 1\\ \end{bmatrix}

# Practice Problems

### For each of the above matrices, verify whether the null space of $$A$$ is orthogonal to the column space of $$A^\top$$

\begin{bmatrix} 1&2&3\\ 1&1&2\\ 2&1&0 \end{bmatrix}
\begin{bmatrix} 1&2&3&3&2&1\\ 1&2&2&1&1&2\\ 2&0&0&1&2&2 \end{bmatrix}
\begin{bmatrix} 0&2&2\\ 1&2&1\\ 2&1&3\\ 3&1&1\\ -1&1&-2\\ 2&1&0\\ \end{bmatrix}
\begin{bmatrix} 1&2&3&3&2&1\\ 2&1&0&1&2&2\\ 4&5&6&7&6&4 \end{bmatrix}