# Projecting a vector on another vector

\mathbf{a}
\mathbf{b}
\mathbf{p}
\mathbf{e} = \mathbf{b} - \mathbf{p}
=\hat{x}\mathbf{a}
\mathbf{a}^\top\mathbf{e} = 0

### (orthogonal vectors)

\mathbf{a}^\top(\mathbf{b} - \mathbf{p}) = 0
\mathbf{a}^\top(\mathbf{b} - \hat{x}\mathbf{a}) = 0
\mathbf{a}^\top\mathbf{b} - \hat{x}\mathbf{a}^\top\mathbf{a} = 0
\hat{x} = \frac{\mathbf{a}^\top\mathbf{b}}{\mathbf{a}^\top\mathbf{a}}
\mathbf{p} = \hat{x}\mathbf{a} = \mathbf{a}\frac{\mathbf{a}^\top\mathbf{b}}{\mathbf{a}^\top\mathbf{a}}
\therefore \mathbf{p} = \frac{\mathbf{a}\mathbf{a}^\top}{\mathbf{a}^\top\mathbf{a}} \mathbf{b}
\begin{bmatrix} a_1\\ a_2\\ \cdots\\ a_n \end{bmatrix}
\begin{bmatrix} a_1& a_2& \cdots& a_n \end{bmatrix}
\mathbf{a}
\mathbf{a}^\top
n\times1
1\times n
\mathbf{a}\mathbf{a}^\top~is~a~n\times n~matrix
\therefore \mathbf{p} = P\mathbf{b}

# The Projection matrix

\mathbf{a}
\mathbf{b}
\mathbf{p}
\mathbf{e} = \mathbf{b} - \mathbf{p}
=x\mathbf{a}
P = \frac{1}{\mathbf{a}^\top\mathbf{a}} \mathbf{a}\mathbf{a}^\top
P^\top = P
P^2 =
\frac{1}{\mathbf{a}^\top\mathbf{a}} \mathbf{a}\mathbf{a}^\top \frac{1}{\mathbf{a}^\top\mathbf{a}} \mathbf{a}\mathbf{a}^\top
=(\frac{1}{\mathbf{a}^\top\mathbf{a}})^2 \mathbf{a}\mathbf{a}^\top \mathbf{a}\mathbf{a}^\top
= \frac{1}{\mathbf{a}^\top\mathbf{a}} \mathbf{a}\mathbf{a}^\top
= P

\mathcal{C}(P) =

# Recap: Goal: Project onto a subspace

\begin{bmatrix} a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\ 72&~&84&~&175&~&\cdots&~&\cdots&\cdots&78\\ 72&~&84&~&175&~&\cdots&~&\cdots&\cdots&78\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\ \end{bmatrix}
=\begin{bmatrix} b_{1}\\ 110\\ 120\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}
\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ \cdots\\ \cdots\\ x_{n}\\ \end{bmatrix}
A
\mathbf{b}
\mathbf{b}
column space of A
\mathbf{p}
\begin{bmatrix} p_{1}\\ 115\\ 115\\ \cdots\\ \cdots\\ p_{m}\\ \end{bmatrix}
\mathbf{p}

# Projecting onto a subspace

Let~Basis(A) = \mathbf{a}_1, \mathbf{a}_2
\mathbf{b}
column space of A
\mathbf{p}

### How would you express $$\mathbf{p}$$?

p = \mathbf{a}_1\hat{x_1}+\mathbf{a}_2\hat{x_2}
(it will be some linear combination of the columns of A)
Let~A = \begin{bmatrix} \uparrow&\uparrow\\ \mathbf{a}_1&\mathbf{a}_2\\ \downarrow&\downarrow \end{bmatrix}

### Our goal: Solve $$A\mathbf{\hat{x}} = \mathbf{p}$$

(we are looking for a similar neat formula for  )
\mathbf{\hat{x}}
\mathbf{a}_1
\mathbf{a}_2
\mathbf{a}
\mathbf{b}
\mathbf{p}
\mathbf{e} = \mathbf{b} - \mathbf{p}
=\hat{x}\mathbf{a}
\hat{x} = \frac{\mathbf{a}^\top\mathbf{b}}{\mathbf{a}^\top\mathbf{a}}

# Projecting onto a subspace

\mathbf{e} = \mathbf{b} - \mathbf{p}
\mathbf{b}
column space of A
\mathbf{p}
\mathbf{a}_1
\mathbf{a}_2
\mathbf{e} = \mathbf{b} - \mathbf{p}
\mathbf{e} = \mathbf{b} - A\mathbf{\hat{x}}
\mathbf{a}_1^\top(\mathbf{b} - A\mathbf{\hat{x}}) = 0
\mathbf{a}_2^\top(\mathbf{b} - A\mathbf{\hat{x}}) = 0
\begin{bmatrix} \leftarrow&\mathbf{a}_1^\top&\rightarrow\\ \leftarrow&\mathbf{a}_2^\top&\rightarrow\\ \end{bmatrix}(\mathbf{b} - A\mathbf{\hat{x}}) = \begin{bmatrix}0\\0\end{bmatrix}
A^\top(\mathbf{b} - A\mathbf{\hat{x}}) = \mathbf{0}
(\mathbf{b} - A\mathbf{\hat{x}}) \in \mathcal{N}(A^\top)
(\mathbf{b} - A\mathbf{\hat{x}}) \perp \mathcal{C}(A)

# Projecting onto a subspace

\mathbf{b}
column space of A
\mathbf{p}
\mathbf{a}_1
\mathbf{a}_2
\mathbf{e} = \mathbf{b} - \mathbf{p}
\mathbf{a}
\mathbf{b}
\mathbf{p}
\mathbf{e} = \mathbf{b} - \mathbf{p}
=\hat{x}\mathbf{a}
\mathbf{a}^\top(\mathbf{b} - \hat{x}\mathbf{a}) = 0
\hat{x} = \frac{\mathbf{a}^\top\mathbf{b}}{\mathbf{a}^\top\mathbf{a}}
\mathbf{p} = \frac{\mathbf{a}\mathbf{a}^\top}{\mathbf{a}^\top\mathbf{a}} \mathbf{b}

### Recap

A^\top(\mathbf{b} - A\mathbf{\hat{x}}) = \mathbf{0}
A^\top A\mathbf{\hat{x}} = A^\top\mathbf{b}
\mathbf{\hat{x}} = (A^\top A)^{-1}A^\top\mathbf{b}
\mathbf{p} = A\mathbf{\hat{x}}
\mathbf{p} = A(A^\top A)^{-1}A^\top\mathbf{b}
P = A(A^\top A)^{-1}A^\top

# The invertibility of $$A^\top A$$

### Just do GE and retain independent columns

(we anyways set the dependent variables to 0 while finding x particular)
(There is another alternative which you will see soon)

# Properties of $$P$$

P = A(A^\top A)^{-1}A^\top
P^\top = P
P^2 = A(A^\top A)^{-1}A^\top
A(A^\top A)^{-1}A^\top
P^2 = A(A^\top A)^{-1}A^\top = P

# Back to our ML example

Yesterday


### How many COVID19 cases would be there tomorrow?

Loc. 1
Loc. 2
Loc. 3
Loc. m
Tomorrow
\begin{bmatrix} a_{11}&~&a_{12}\\ a_{21}&~&a_{22}\\ a_{31}&~&a_{32}\\ \cdots&~&\cdots\\ \cdots&~&\cdots\\ a_{m1}&~&a_{m2}\\ \end{bmatrix}
\begin{bmatrix} b_{1}\\ b_{2}\\ b_{3}\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}

### But you don't know what $$f$$ is!

b_{1} = f(a_{11}, a_{12})

### (a toy example)

Today
(In practice, there would be many more variables but for simplicity and ease of visualisation we consider only 2 variables)
\begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix}
=
b = x_1*a_{yesterday} + x_2*a_{today}

# Back to our ML example

Yesterday
Tomorrow
\begin{bmatrix} 1&~&2\\ 3&~&2\\ 2&~&0\\ 3&~&1\\ 2&~&1\\ 3&~&3\\ \end{bmatrix}

### (a toy example)

Today
\begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix}
=
\begin{bmatrix} 2\\ 2.25\\ 1.25\\ 2\\ 1.75\\ 2.75\\ \end{bmatrix}
Loc. 1
Loc. 2
Loc. 3
Loc. 6
b = x_1*a_{yesterday} + x_2*a_{today}

### Practice Problem: Perform Gaussian Elimination and check whether the above system of equations has a solution

It does not but still verify it!

# So what do we do?

### $$\mathbf{b}$$ is not in the column space of $$A$$

\begin{bmatrix} 1&~&2\\ 3&~&2\\ 2&~&0\\ 3&~&1\\ 2&~&1\\ 3&~&3\\ \end{bmatrix}
\begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix}
=
A
\mathbf{x}
\mathbf{b}

### Recap

A^\top A\mathbf{\hat{x}} = A^\top\mathbf{b}
\mathbf{\hat{x}} = (A^\top A)^{-1}A^\top\mathbf{b}
\begin{bmatrix} 2\\ 2.25\\ 1.25\\ 2\\ 1.75\\ 2.75\\ \end{bmatrix}
column space of A
\mathbf{a}_1\in \mathbb{R}^6
\mathbf{a}_2\in \mathbb{R}^6

### $$\mathcal{C}(A)$$ is a 2d plane in a 6 dimensional space

\mathbf{b}\in \mathbb{R}^6
\mathbf{p}

# Finding $$\hat{\mathbf{x}}$$

\begin{bmatrix} 1&~&2\\ 3&~&2\\ 2&~&0\\ 3&~&1\\ 2&~&1\\ 3&~&3\\ \end{bmatrix}
\begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix}
=
A^\top
A
\mathbf{\hat{x}}

### Recap

A^\top A\mathbf{\hat{x}} = A^\top\mathbf{b}
\begin{bmatrix} 1&3&2&3&2&3\\ 2&2&0&1&1&3 \end{bmatrix}
A^\top
\begin{bmatrix} 1&3&2&3&2&3\\ 2&2&0&1&1&3 \end{bmatrix}
\mathbf{b}
\begin{bmatrix} 36&22\\ 22&19 \end{bmatrix}
\begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix}
\mathbf{\hat{x}}
=\begin{bmatrix} 29\\ 20.5 \end{bmatrix}
\begin{bmatrix} 2\\ 2.25\\ 1.25\\ 2\\ 1.75\\ 2.75\\ \end{bmatrix}
x_{1}=0.5\\ x_{2}=0.5

# The two geometric views

b = 0.5*a_{yesterday} + 0.5*a_{today}
b
a_{yesterday}
a_{today}
column space of A
\mathbf{a}_1\in \mathbb{R}^6
\mathbf{a}_2\in \mathbb{R}^6
\mathbf{b}\in \mathbb{R}^6
\mathbf{p}

# One method, many names

### (How?)

\min_{\mathbf{p}} ||\mathbf{b} - \mathbf{p}||_2^2
\min_{\mathbf{\hat{x}}} ||\mathbf{b} - A\mathbf{\hat{x}}||_2^2
\begin{bmatrix} b_1\\ b_2\\ b_3\\ \cdots\\ b_m\\ \end{bmatrix}
\begin{bmatrix} p_1\\ p_2\\ p_3\\ \cdots\\ p_m\\ \end{bmatrix}
\begin{bmatrix} b_1-p_1\\ b_2-p_2\\ b_3-p_3\\ \cdots\\ b_m-p_m\\ \end{bmatrix}
\mathbf{b}-\mathbf{p}
\min_{\mathbf{\hat{x}}} (\mathbf{b} - A\mathbf{\hat{x}})^\top(\mathbf{b} - A\mathbf{\hat{x}})
\min_{\mathbf{\hat{x}}} (\mathbf{b}^\top - \mathbf{\hat{x}}^\top A^\top)(\mathbf{b} - A\mathbf{\hat{x}})
\min_{\mathbf{\hat{x}}} (\mathbf{b}^\top - \mathbf{\hat{x}}^\top A^\top)(\mathbf{b} - A\mathbf{\hat{x}})
\min_{\mathbf{\hat{x}}} (\mathbf{b}^\top\mathbf{b} - \mathbf{\hat{x}}^\top A^\top\mathbf{b} - \mathbf{b}^\top A\mathbf{\hat{x}} + \mathbf{x}^\top A^\top A\mathbf{\hat{x}})

### Take derivative w.r.t $$\mathbf{\hat{x}}$$ and set to 0

2A^\top\mathbf{b} - 2A^\top A\mathbf{\hat{x}} = 0
\implies A^\top A\mathbf{\hat{x}} = A^\top\mathbf{b}