Learning Objectives

How do you project a vector onto another vector?

(for today's lecture)

How do you project a vector onto a subspace?

What is "linear regression" or "least squares" method?

Projecting a vector on another vector

\mathbf{a}

\mathbf{a}

\mathbf{b}

\mathbf{b}

\mathbf{p}

\mathbf{p}

\mathbf{e} = \mathbf{b} - \mathbf{p}

\mathbf{e} = \mathbf{b} - \mathbf{p}

=\hat{x}\mathbf{a}

=\hat{x}\mathbf{a}

\mathbf{a}^\top\mathbf{e} = 0

\mathbf{a}^\top\mathbf{e} = 0

(orthogonal vectors)

\mathbf{a}^\top(\mathbf{b} - \mathbf{p}) = 0

\mathbf{a}^\top(\mathbf{b} - \mathbf{p}) = 0

\mathbf{a}^\top(\mathbf{b} - \hat{x}\mathbf{a}) = 0

\mathbf{a}^\top(\mathbf{b} - \hat{x}\mathbf{a}) = 0

\mathbf{a}^\top\mathbf{b} - \hat{x}\mathbf{a}^\top\mathbf{a} = 0

\mathbf{a}^\top\mathbf{b} - \hat{x}\mathbf{a}^\top\mathbf{a} = 0

\hat{x} = \frac{\mathbf{a}^\top\mathbf{b}}{\mathbf{a}^\top\mathbf{a}}

\hat{x} = \frac{\mathbf{a}^\top\mathbf{b}}{\mathbf{a}^\top\mathbf{a}}

\mathbf{p} = \hat{x}\mathbf{a} = \mathbf{a}\frac{\mathbf{a}^\top\mathbf{b}}{\mathbf{a}^\top\mathbf{a}}

\mathbf{p} = \hat{x}\mathbf{a} = \mathbf{a}\frac{\mathbf{a}^\top\mathbf{b}}{\mathbf{a}^\top\mathbf{a}}

\therefore \mathbf{p} = \frac{\mathbf{a}\mathbf{a}^\top}{\mathbf{a}^\top\mathbf{a}} \mathbf{b}

\therefore \mathbf{p} = \frac{\mathbf{a}\mathbf{a}^\top}{\mathbf{a}^\top\mathbf{a}} \mathbf{b}

\begin{bmatrix} a_1\\ a_2\\ \cdots\\ a_n \end{bmatrix}

\begin{bmatrix} a_1\\ a_2\\ \cdots\\ a_n \end{bmatrix}

\begin{bmatrix} a_1& a_2& \cdots& a_n \end{bmatrix}

\begin{bmatrix} a_1& a_2& \cdots& a_n \end{bmatrix}

\mathbf{a}

\mathbf{a}

\mathbf{a}^\top

\mathbf{a}^\top

n\times1

n\times1

1\times n

1\times n

\mathbf{a}\mathbf{a}^\top~is~a~n\times n~matrix

\mathbf{a}\mathbf{a}^\top~is~a~n\times n~matrix

\therefore \mathbf{p} = P\mathbf{b}

\therefore \mathbf{p} = P\mathbf{b}

The Projection matrix

\mathbf{a}

\mathbf{a}

\mathbf{b}

\mathbf{b}

\mathbf{p}

\mathbf{p}

\mathbf{e} = \mathbf{b} - \mathbf{p}

\mathbf{e} = \mathbf{b} - \mathbf{p}

=x\mathbf{a}

=x\mathbf{a}

P = \frac{1}{\mathbf{a}^\top\mathbf{a}} \mathbf{a}\mathbf{a}^\top

P = \frac{1}{\mathbf{a}^\top\mathbf{a}} \mathbf{a}\mathbf{a}^\top

P^\top = P

P^\top = P

P^2 =

P^2 =

\frac{1}{\mathbf{a}^\top\mathbf{a}} \mathbf{a}\mathbf{a}^\top \frac{1}{\mathbf{a}^\top\mathbf{a}} \mathbf{a}\mathbf{a}^\top

\frac{1}{\mathbf{a}^\top\mathbf{a}} \mathbf{a}\mathbf{a}^\top \frac{1}{\mathbf{a}^\top\mathbf{a}} \mathbf{a}\mathbf{a}^\top

=(\frac{1}{\mathbf{a}^\top\mathbf{a}})^2 \mathbf{a}\mathbf{a}^\top \mathbf{a}\mathbf{a}^\top

=(\frac{1}{\mathbf{a}^\top\mathbf{a}})^2 \mathbf{a}\mathbf{a}^\top \mathbf{a}\mathbf{a}^\top

= \frac{1}{\mathbf{a}^\top\mathbf{a}} \mathbf{a}\mathbf{a}^\top

= \frac{1}{\mathbf{a}^\top\mathbf{a}} \mathbf{a}\mathbf{a}^\top

= P

= P

(multiplying any vector by $P$ will project that vector onto $\mathbf{a}$ )

\mathcal{C}(P) =

\mathcal{C}(P) =

a line - all multiples of $\mathbf{a}$

Recap: Goal: Project onto a subspace

\begin{bmatrix} a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\ 72&~&84&~&175&~&\cdots&~&\cdots&\cdots&78\\ 72&~&84&~&175&~&\cdots&~&\cdots&\cdots&78\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\ \end{bmatrix}

\begin{bmatrix} a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\ 72&~&84&~&175&~&\cdots&~&\cdots&\cdots&78\\ 72&~&84&~&175&~&\cdots&~&\cdots&\cdots&78\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ \cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\ a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\ \end{bmatrix}

=\begin{bmatrix} b_{1}\\ 110\\ 120\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}

=\begin{bmatrix} b_{1}\\ 110\\ 120\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}

\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ \cdots\\ \cdots\\ x_{n}\\ \end{bmatrix}

\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ \cdots\\ \cdots\\ x_{n}\\ \end{bmatrix}

A

A

\mathbf{b}

\mathbf{b}

\mathbf{b}

\mathbf{b}

column space of A

\mathbf{p}

\mathbf{p}

\begin{bmatrix} p_{1}\\ 115\\ 115\\ \cdots\\ \cdots\\ p_{m}\\ \end{bmatrix}

\begin{bmatrix} p_{1}\\ 115\\ 115\\ \cdots\\ \cdots\\ p_{m}\\ \end{bmatrix}

\mathbf{p}

\mathbf{p}

"Project" $\mathbf{b}$ into the column space of $A$

Solve $A\mathbf{\hat{x}} = \mathbf{p}$

$\mathbf{\hat{x}}$ is the best possible approximation

$\mathbf{b}$ is not in the column space of $A$

Hence no solution to $A\mathbf{x} = \mathbf{b}$

Projecting onto a subspace

Let~Basis(\mathcal{C}(A)) = \mathbf{a}_1, \mathbf{a}_2

Let~Basis(\mathcal{C}(A)) = \mathbf{a}_1, \mathbf{a}_2

\mathbf{b}

\mathbf{b}

column space of A

\mathbf{p}

\mathbf{p}

How would you express $\mathbf{p}$ ?

p = \mathbf{a}_1\hat{x_1}+\mathbf{a}_2\hat{x_2}

p = \mathbf{a}_1\hat{x_1}+\mathbf{a}_2\hat{x_2}

(it will be some linear combination of the columns of A)

Let~A = \begin{bmatrix} \uparrow&\uparrow\\ \mathbf{a}_1&\mathbf{a}_2\\ \downarrow&\downarrow \end{bmatrix}

Let~A = \begin{bmatrix} \uparrow&\uparrow\\ \mathbf{a}_1&\mathbf{a}_2\\ \downarrow&\downarrow \end{bmatrix}

Our goal: Solve $A\mathbf{\hat{x}} = \mathbf{p}$

(we are looking for a similar neat formula for  )

\mathbf{\hat{x}}

\mathbf{\hat{x}}

\mathbf{a}_1

\mathbf{a}_1

\mathbf{a}_2

\mathbf{a}_2

\mathbf{a}

\mathbf{a}

\mathbf{b}

\mathbf{b}

\mathbf{p}

\mathbf{p}

\mathbf{e} = \mathbf{b} - \mathbf{p}

\mathbf{e} = \mathbf{b} - \mathbf{p}

=\hat{x}\mathbf{a}

=\hat{x}\mathbf{a}

\hat{x} = \frac{\mathbf{a}^\top\mathbf{b}}{\mathbf{a}^\top\mathbf{a}}

\hat{x} = \frac{\mathbf{a}^\top\mathbf{b}}{\mathbf{a}^\top\mathbf{a}}

Recap

Projecting onto a subspace

\mathbf{e} = \mathbf{b} - \mathbf{p}

\mathbf{e} = \mathbf{b} - \mathbf{p}

\mathbf{b}

\mathbf{b}

column space of A

\mathbf{p}

\mathbf{p}

\mathbf{a}_1

\mathbf{a}_1

\mathbf{a}_2

\mathbf{a}_2

\mathbf{e} = \mathbf{b} - \mathbf{p}

\mathbf{e} = \mathbf{b} - \mathbf{p}

\mathbf{e} = \mathbf{b} - A\mathbf{\hat{x}}

\mathbf{e} = \mathbf{b} - A\mathbf{\hat{x}}

\mathbf{a}_1^\top(\mathbf{b} - A\mathbf{\hat{x}}) = 0

\mathbf{a}_1^\top(\mathbf{b} - A\mathbf{\hat{x}}) = 0

\mathbf{a}_2^\top(\mathbf{b} - A\mathbf{\hat{x}}) = 0

\mathbf{a}_2^\top(\mathbf{b} - A\mathbf{\hat{x}}) = 0

\begin{bmatrix} \leftarrow&\mathbf{a}_1^\top&\rightarrow\\ \leftarrow&\mathbf{a}_2^\top&\rightarrow\\ \end{bmatrix}(\mathbf{b} - A\mathbf{\hat{x}}) = \begin{bmatrix}0\\0\end{bmatrix}

\begin{bmatrix} \leftarrow&\mathbf{a}_1^\top&\rightarrow\\ \leftarrow&\mathbf{a}_2^\top&\rightarrow\\ \end{bmatrix}(\mathbf{b} - A\mathbf{\hat{x}}) = \begin{bmatrix}0\\0\end{bmatrix}

A^\top(\mathbf{b} - A\mathbf{\hat{x}}) = \mathbf{0}

A^\top(\mathbf{b} - A\mathbf{\hat{x}}) = \mathbf{0}

(\mathbf{b} - A\mathbf{\hat{x}}) \in \mathcal{N}(A^\top)

(\mathbf{b} - A\mathbf{\hat{x}}) \in \mathcal{N}(A^\top)

(\mathbf{b} - A\mathbf{\hat{x}}) \perp \mathcal{C}(A)

(\mathbf{b} - A\mathbf{\hat{x}}) \perp \mathcal{C}(A)

Projecting onto a subspace

\mathbf{b}

\mathbf{b}

column space of A

\mathbf{p}

\mathbf{p}

\mathbf{a}_1

\mathbf{a}_1

\mathbf{a}_2

\mathbf{a}_2

\mathbf{e} = \mathbf{b} - \mathbf{p}

\mathbf{e} = \mathbf{b} - \mathbf{p}

\mathbf{a}

\mathbf{a}

\mathbf{b}

\mathbf{b}

\mathbf{p}

\mathbf{p}

\mathbf{e} = \mathbf{b} - \mathbf{p}

\mathbf{e} = \mathbf{b} - \mathbf{p}

=\hat{x}\mathbf{a}

=\hat{x}\mathbf{a}

\mathbf{a}^\top(\mathbf{b} - \hat{x}\mathbf{a}) = 0

\mathbf{a}^\top(\mathbf{b} - \hat{x}\mathbf{a}) = 0

\hat{x} = \frac{\mathbf{a}^\top\mathbf{b}}{\mathbf{a}^\top\mathbf{a}}

\hat{x} = \frac{\mathbf{a}^\top\mathbf{b}}{\mathbf{a}^\top\mathbf{a}}

\mathbf{p} = \frac{\mathbf{a}\mathbf{a}^\top}{\mathbf{a}^\top\mathbf{a}} \mathbf{b}

\mathbf{p} = \frac{\mathbf{a}\mathbf{a}^\top}{\mathbf{a}^\top\mathbf{a}} \mathbf{b}

Recap

A^\top(\mathbf{b} - A\mathbf{\hat{x}}) = \mathbf{0}

A^\top(\mathbf{b} - A\mathbf{\hat{x}}) = \mathbf{0}

A^\top A\mathbf{\hat{x}} = A^\top\mathbf{b}

A^\top A\mathbf{\hat{x}} = A^\top\mathbf{b}

\mathbf{\hat{x}} = (A^\top A)^{-1}A^\top\mathbf{b}

\mathbf{\hat{x}} = (A^\top A)^{-1}A^\top\mathbf{b}

\mathbf{p} = A\mathbf{\hat{x}}

\mathbf{p} = A\mathbf{\hat{x}}

\mathbf{p} = A(A^\top A)^{-1}A^\top\mathbf{b}

\mathbf{p} = A(A^\top A)^{-1}A^\top\mathbf{b}

P = A(A^\top A)^{-1}A^\top

P = A(A^\top A)^{-1}A^\top

How do I know this inverse exists?

The invertibility of $A^\top A$

Theorem: If $A$ has $n$ independent columns then $A^\top A$ is invertible

Proof: HW4

We can rely on this result because we have assumed $A$ contains independent columns

What if that is not the case?

Just do GE and retain independent columns

(we anyways set the dependent variables to 0 while finding x particular)

(There is another alternative which you will see soon)

Properties of $P$

P = A(A^\top A)^{-1}A^\top

P = A(A^\top A)^{-1}A^\top

P^\top = P

P^\top = P

P^2 = A(A^\top A)^{-1}A^\top

P^2 = A(A^\top A)^{-1}A^\top

A(A^\top A)^{-1}A^\top

A(A^\top A)^{-1}A^\top

P^2 = A(A^\top A)^{-1}A^\top = P

P^2 = A(A^\top A)^{-1}A^\top = P

HW4: What if $A$ is a full rank square matrix?

Back to our ML example

Yesterday

How many COVID19 cases would be there tomorrow?

Loc. 1

Loc. 2

Loc. 3

Loc. m

Tomorrow

\begin{bmatrix} a_{11}&~&a_{12}\\ a_{21}&~&a_{22}\\ a_{31}&~&a_{32}\\ \cdots&~&\cdots\\ \cdots&~&\cdots\\ a_{m1}&~&a_{m2}\\ \end{bmatrix}

\begin{bmatrix} a_{11}&~&a_{12}\\ a_{21}&~&a_{22}\\ a_{31}&~&a_{32}\\ \cdots&~&\cdots\\ \cdots&~&\cdots\\ a_{m1}&~&a_{m2}\\ \end{bmatrix}

\begin{bmatrix} b_{1}\\ b_{2}\\ b_{3}\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}

\begin{bmatrix} b_{1}\\ b_{2}\\ b_{3}\\ \cdots\\ \cdots\\ b_{m}\\ \end{bmatrix}

You know that some relation exists between '# of cases' and the 2 variables

But you don't know what $f$ is!

b_{1} = f(a_{11}, a_{12})

b_{1} = f(a_{11}, a_{12})

We will just assume that it is linear

(a toy example)

Today

(In practice, there would be many more variables but for simplicity and ease of visualisation we consider only 2 variables)

\begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix}

\begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix}

=

=

b = x_1*a_{yesterday} + x_2*a_{today}

b = x_1*a_{yesterday} + x_2*a_{today}

Back to our ML example

Yesterday

Tomorrow

\begin{bmatrix} 1&~&2\\ 3&~&2\\ 2&~&0\\ 3&~&1\\ 2&~&1\\ 3&~&3\\ \end{bmatrix}

\begin{bmatrix} 1&~&2\\ 3&~&2\\ 2&~&0\\ 3&~&1\\ 2&~&1\\ 3&~&3\\ \end{bmatrix}

(a toy example)

Today

\begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix}

\begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix}

=

=

\begin{bmatrix} 2\\ 2.25\\ 1.25\\ 2\\ 1.75\\ 2.75\\ \end{bmatrix}

\begin{bmatrix} 2\\ 2.25\\ 1.25\\ 2\\ 1.75\\ 2.75\\ \end{bmatrix}

Loc. 1

Loc. 2

Loc. 3

Loc. 6

b = x_1*a_{yesterday} + x_2*a_{today}

b = x_1*a_{yesterday} + x_2*a_{today}

Does the above system of equations have a solution?

Practice Problem: Perform Gaussian Elimination and check whether the above system of equations has a solution

It does not but still verify it!

So what do we do?

$\mathbf{b}$ is not in the column space of $A$

\begin{bmatrix} 1&~&2\\ 3&~&2\\ 2&~&0\\ 3&~&1\\ 2&~&1\\ 3&~&3\\ \end{bmatrix}

\begin{bmatrix} 1&~&2\\ 3&~&2\\ 2&~&0\\ 3&~&1\\ 2&~&1\\ 3&~&3\\ \end{bmatrix}

\begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix}

\begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix}

=

=

A

A

\mathbf{x}

\mathbf{x}

\mathbf{b}

\mathbf{b}

Find its projection $\mathbf{p}$

Solve $A\mathbf{\hat{x}} = \mathbf{p}$

Recap

A^\top A\mathbf{\hat{x}} = A^\top\mathbf{b}

A^\top A\mathbf{\hat{x}} = A^\top\mathbf{b}

\mathbf{\hat{x}} = (A^\top A)^{-1}A^\top\mathbf{b}

\mathbf{\hat{x}} = (A^\top A)^{-1}A^\top\mathbf{b}

\begin{bmatrix} 2\\ 2.25\\ 1.25\\ 2\\ 1.75\\ 2.75\\ \end{bmatrix}

\begin{bmatrix} 2\\ 2.25\\ 1.25\\ 2\\ 1.75\\ 2.75\\ \end{bmatrix}

column space of A

\mathbf{a}_1\in \mathbb{R}^6

\mathbf{a}_1\in \mathbb{R}^6

\mathbf{a}_2\in \mathbb{R}^6

\mathbf{a}_2\in \mathbb{R}^6

$\mathcal{C}(A)$ is a 2d plane in a 6 dimensional space

\mathbf{b}\in \mathbb{R}^6

\mathbf{b}\in \mathbb{R}^6

\mathbf{p}

\mathbf{p}

Finding $\hat{\mathbf{x}}$

\begin{bmatrix} 1&~&2\\ 3&~&2\\ 2&~&0\\ 3&~&1\\ 2&~&1\\ 3&~&3\\ \end{bmatrix}

\begin{bmatrix} 1&~&2\\ 3&~&2\\ 2&~&0\\ 3&~&1\\ 2&~&1\\ 3&~&3\\ \end{bmatrix}

\begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix}

\begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix}

=

=

A^\top

A^\top

A

A

\mathbf{\hat{x}}

\mathbf{\hat{x}}

Recap

A^\top A\mathbf{\hat{x}} = A^\top\mathbf{b}

A^\top A\mathbf{\hat{x}} = A^\top\mathbf{b}

\begin{bmatrix} 1&3&2&3&2&3\\ 2&2&0&1&1&3 \end{bmatrix}

\begin{bmatrix} 1&3&2&3&2&3\\ 2&2&0&1&1&3 \end{bmatrix}

A^\top

A^\top

\begin{bmatrix} 1&3&2&3&2&3\\ 2&2&0&1&1&3 \end{bmatrix}

\begin{bmatrix} 1&3&2&3&2&3\\ 2&2&0&1&1&3 \end{bmatrix}

\mathbf{b}

\mathbf{b}

\begin{bmatrix} 36&22\\ 22&19 \end{bmatrix}

\begin{bmatrix} 36&22\\ 22&19 \end{bmatrix}

\begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix}

\begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix}

\mathbf{\hat{x}}

\mathbf{\hat{x}}

=\begin{bmatrix} 29\\ 20.5 \end{bmatrix}

=\begin{bmatrix} 29\\ 20.5 \end{bmatrix}

\begin{bmatrix} 2\\ 2.25\\ 1.25\\ 2\\ 1.75\\ 2.75\\ \end{bmatrix}

\begin{bmatrix} 2\\ 2.25\\ 1.25\\ 2\\ 1.75\\ 2.75\\ \end{bmatrix}

x_{1}=0.5\\ x_{2}=0.5

x_{1}=0.5\\ x_{2}=0.5

The two geometric views

b = 0.5*a_{yesterday} + 0.5*a_{today}

b = 0.5*a_{yesterday} + 0.5*a_{today}

b

b

a_{yesterday}

a_{yesterday}

a_{today}

a_{today}

column space of A

\mathbf{a}_1\in \mathbb{R}^6

\mathbf{a}_1\in \mathbb{R}^6

\mathbf{a}_2\in \mathbb{R}^6

\mathbf{a}_2\in \mathbb{R}^6

\mathbf{b}\in \mathbb{R}^6

\mathbf{b}\in \mathbb{R}^6

\mathbf{p}

\mathbf{p}

$b$ is a function of the inputs

We have assumed this function to be linear

What is the geometric interpretation?

(switch to geogebra)

One method, many names

We are "fitting a line/plane"

We are doing "linear regression"

We are finding the "least squares" solution

(How?)

\min_{\mathbf{p}} ||\mathbf{b} - \mathbf{p}||_2^2

\min_{\mathbf{p}} ||\mathbf{b} - \mathbf{p}||_2^2

\min_{\mathbf{\hat{x}}} ||\mathbf{b} - A\mathbf{\hat{x}}||_2^2

\min_{\mathbf{\hat{x}}} ||\mathbf{b} - A\mathbf{\hat{x}}||_2^2

\begin{bmatrix} b_1\\ b_2\\ b_3\\ \cdots\\ b_m\\ \end{bmatrix}

\begin{bmatrix} b_1\\ b_2\\ b_3\\ \cdots\\ b_m\\ \end{bmatrix}

\begin{bmatrix} p_1\\ p_2\\ p_3\\ \cdots\\ p_m\\ \end{bmatrix}

\begin{bmatrix} p_1\\ p_2\\ p_3\\ \cdots\\ p_m\\ \end{bmatrix}

\begin{bmatrix} b_1-p_1\\ b_2-p_2\\ b_3-p_3\\ \cdots\\ b_m-p_m\\ \end{bmatrix}

\begin{bmatrix} b_1-p_1\\ b_2-p_2\\ b_3-p_3\\ \cdots\\ b_m-p_m\\ \end{bmatrix}

\mathbf{b}-\mathbf{p}

\mathbf{b}-\mathbf{p}

\min_{\mathbf{\hat{x}}} (\mathbf{b} - A\mathbf{\hat{x}})^\top(\mathbf{b} - A\mathbf{\hat{x}})

\min_{\mathbf{\hat{x}}} (\mathbf{b} - A\mathbf{\hat{x}})^\top(\mathbf{b} - A\mathbf{\hat{x}})

\min_{\mathbf{\hat{x}}} (\mathbf{b}^\top - \mathbf{\hat{x}}^\top A^\top)(\mathbf{b} - A\mathbf{\hat{x}})

\min_{\mathbf{\hat{x}}} (\mathbf{b}^\top - \mathbf{\hat{x}}^\top A^\top)(\mathbf{b} - A\mathbf{\hat{x}})

\min_{\mathbf{\hat{x}}} (\mathbf{b}^\top - \mathbf{\hat{x}}^\top A^\top)(\mathbf{b} - A\mathbf{\hat{x}})

\min_{\mathbf{\hat{x}}} (\mathbf{b}^\top - \mathbf{\hat{x}}^\top A^\top)(\mathbf{b} - A\mathbf{\hat{x}})

\min_{\mathbf{\hat{x}}} (\mathbf{b}^\top\mathbf{b} - \mathbf{\hat{x}}^\top A^\top\mathbf{b} - \mathbf{b}^\top A\mathbf{\hat{x}} + \mathbf{x}^\top A^\top A\mathbf{\hat{x}})

\min_{\mathbf{\hat{x}}} (\mathbf{b}^\top\mathbf{b} - \mathbf{\hat{x}}^\top A^\top\mathbf{b} - \mathbf{b}^\top A\mathbf{\hat{x}} + \mathbf{x}^\top A^\top A\mathbf{\hat{x}})

Take derivative w.r.t $\mathbf{\hat{x}}$ and set to 0

2A^\top\mathbf{b} - 2A^\top A\mathbf{\hat{x}} = 0

2A^\top\mathbf{b} - 2A^\top A\mathbf{\hat{x}} = 0

\implies A^\top A\mathbf{\hat{x}} = A^\top\mathbf{b}

\implies A^\top A\mathbf{\hat{x}} = A^\top\mathbf{b}

CS6015: Linear Algebra and Random Processes

Lecture 12: Projecting a vector onto another vector, Projecting a vector on to a subspace, Linear Regression (Least Squares)

CS6015: Lecture 12

More from Mitesh Khapra