CS6015: Linear Algebra and Random Processes
Lecture 12: Projecting a vector onto another vector, Projecting a vector on to a subspace, Linear Regression (Least Squares)
Learning Objectives
How do you project a vector onto another vector?
(for today's lecture)
How do you project a vector onto a subspace?
What is "linear regression" or "least squares" method?
Projecting a vector on another vector
a
\mathbf{a}
b
\mathbf{b}
p
\mathbf{p}
e=b−p
\mathbf{e} = \mathbf{b} - \mathbf{p}
=x^a
=\hat{x}\mathbf{a}
a⊤e=0
\mathbf{a}^\top\mathbf{e} = 0
(orthogonal vectors)
a⊤(b−p)=0
\mathbf{a}^\top(\mathbf{b} - \mathbf{p}) = 0
a⊤(b−x^a)=0
\mathbf{a}^\top(\mathbf{b} - \hat{x}\mathbf{a}) = 0
a⊤b−x^a⊤a=0
\mathbf{a}^\top\mathbf{b} - \hat{x}\mathbf{a}^\top\mathbf{a} = 0
x^=a⊤aa⊤b
\hat{x} = \frac{\mathbf{a}^\top\mathbf{b}}{\mathbf{a}^\top\mathbf{a}}
p=x^a=aa⊤aa⊤b
\mathbf{p} = \hat{x}\mathbf{a} = \mathbf{a}\frac{\mathbf{a}^\top\mathbf{b}}{\mathbf{a}^\top\mathbf{a}}
∴p=a⊤aaa⊤b
\therefore \mathbf{p} = \frac{\mathbf{a}\mathbf{a}^\top}{\mathbf{a}^\top\mathbf{a}} \mathbf{b}
a1a2⋯an
\begin{bmatrix}
a_1\\
a_2\\
\cdots\\
a_n
\end{bmatrix}
[a1a2⋯an]
\begin{bmatrix}
a_1&
a_2&
\cdots&
a_n
\end{bmatrix}
a
\mathbf{a}
a⊤
\mathbf{a}^\top
n×1
n\times1
1×n
1\times n
aa⊤ is a n×n matrix
\mathbf{a}\mathbf{a}^\top~is~a~n\times n~matrix
∴p=Pb
\therefore \mathbf{p} = P\mathbf{b}
The Projection matrix
a
\mathbf{a}
b
\mathbf{b}
p
\mathbf{p}
e=b−p
\mathbf{e} = \mathbf{b} - \mathbf{p}
=xa
=x\mathbf{a}
P=a⊤a1aa⊤
P = \frac{1}{\mathbf{a}^\top\mathbf{a}} \mathbf{a}\mathbf{a}^\top
P⊤=P
P^\top = P
P2=
P^2 =
a⊤a1aa⊤a⊤a1aa⊤
\frac{1}{\mathbf{a}^\top\mathbf{a}} \mathbf{a}\mathbf{a}^\top \frac{1}{\mathbf{a}^\top\mathbf{a}} \mathbf{a}\mathbf{a}^\top
=(a⊤a1)2aa⊤aa⊤
=(\frac{1}{\mathbf{a}^\top\mathbf{a}})^2 \mathbf{a}\mathbf{a}^\top \mathbf{a}\mathbf{a}^\top
=a⊤a1aa⊤
= \frac{1}{\mathbf{a}^\top\mathbf{a}} \mathbf{a}\mathbf{a}^\top
=P
= P
(multiplying any vector by P will project that vector onto a)
C(P)=
\mathcal{C}(P) =
a line - all multiples of a
Recap: Goal: Project onto a subspace
a117272⋯⋯am1 a128484⋯⋯am2 a13175175⋯⋯am3 a14⋯⋯⋯⋯am4 a15⋯⋯⋯⋯am5⋯⋯⋯⋯⋯⋯a1n7878⋯⋯amn
\begin{bmatrix}
a_{11}&~&a_{12}&~&a_{13}&~&a_{14}&~&a_{15}&\cdots&a_{1n}\\
72&~&84&~&175&~&\cdots&~&\cdots&\cdots&78\\
72&~&84&~&175&~&\cdots&~&\cdots&\cdots&78\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
\cdots&~&\cdots&~&\cdots&~&\cdots&~&\cdots&\cdots&\cdots\\
a_{m1}&~&a_{m2}&~&a_{m3}&~&a_{m4}&~&a_{m5}&\cdots&a_{mn}\\
\end{bmatrix}
=b1110120⋯⋯bm
=\begin{bmatrix}
b_{1}\\
110\\
120\\
\cdots\\
\cdots\\
b_{m}\\
\end{bmatrix}
x1x2x3⋯⋯xn
\begin{bmatrix}
x_{1}\\
x_{2}\\
x_{3}\\
\cdots\\
\cdots\\
x_{n}\\
\end{bmatrix}
A
A
b
\mathbf{b}
b
\mathbf{b}
column space of A
p
\mathbf{p}
p1115115⋯⋯pm
\begin{bmatrix}
p_{1}\\
115\\
115\\
\cdots\\
\cdots\\
p_{m}\\
\end{bmatrix}
p
\mathbf{p}
"Project" b into the column space of A
Solve Ax^=p
x^ is the best possible approximation
b is not in the column space of A
Hence no solution to Ax=b
Projecting onto a subspace
Let Basis(C(A))=a1,a2
Let~Basis(\mathcal{C}(A)) = \mathbf{a}_1, \mathbf{a}_2
b
\mathbf{b}
column space of A
p
\mathbf{p}
How would you express p?
p=a1x1^+a2x2^
p = \mathbf{a}_1\hat{x_1}+\mathbf{a}_2\hat{x_2}
(it will be some linear combination of the columns of A)
Let A=↑a1↓↑a2↓
Let~A = \begin{bmatrix}
\uparrow&\uparrow\\
\mathbf{a}_1&\mathbf{a}_2\\
\downarrow&\downarrow
\end{bmatrix}
Our goal: Solve Ax^=p
(we are looking for a similar neat formula for )
x^
\mathbf{\hat{x}}
a1
\mathbf{a}_1
a2
\mathbf{a}_2
a
\mathbf{a}
b
\mathbf{b}
p
\mathbf{p}
e=b−p
\mathbf{e} = \mathbf{b} - \mathbf{p}
=x^a
=\hat{x}\mathbf{a}
x^=a⊤aa⊤b
\hat{x} = \frac{\mathbf{a}^\top\mathbf{b}}{\mathbf{a}^\top\mathbf{a}}
Recap
Projecting onto a subspace
e=b−p
\mathbf{e} = \mathbf{b} - \mathbf{p}
b
\mathbf{b}
column space of A
p
\mathbf{p}
a1
\mathbf{a}_1
a2
\mathbf{a}_2
e=b−p
\mathbf{e} = \mathbf{b} - \mathbf{p}
e=b−Ax^
\mathbf{e} = \mathbf{b} - A\mathbf{\hat{x}}
a1⊤(b−Ax^)=0
\mathbf{a}_1^\top(\mathbf{b} - A\mathbf{\hat{x}}) = 0
a2⊤(b−Ax^)=0
\mathbf{a}_2^\top(\mathbf{b} - A\mathbf{\hat{x}}) = 0
[←←a1⊤a2⊤→→](b−Ax^)=[00]
\begin{bmatrix}
\leftarrow&\mathbf{a}_1^\top&\rightarrow\\
\leftarrow&\mathbf{a}_2^\top&\rightarrow\\
\end{bmatrix}(\mathbf{b} - A\mathbf{\hat{x}}) = \begin{bmatrix}0\\0\end{bmatrix}
A⊤(b−Ax^)=0
A^\top(\mathbf{b} - A\mathbf{\hat{x}}) = \mathbf{0}
(b−Ax^)∈N(A⊤)
(\mathbf{b} - A\mathbf{\hat{x}}) \in \mathcal{N}(A^\top)
(b−Ax^)⊥C(A)
(\mathbf{b} - A\mathbf{\hat{x}}) \perp \mathcal{C}(A)
Projecting onto a subspace
b
\mathbf{b}
column space of A
p
\mathbf{p}
a1
\mathbf{a}_1
a2
\mathbf{a}_2
e=b−p
\mathbf{e} = \mathbf{b} - \mathbf{p}
a
\mathbf{a}
b
\mathbf{b}
p
\mathbf{p}
e=b−p
\mathbf{e} = \mathbf{b} - \mathbf{p}
=x^a
=\hat{x}\mathbf{a}
a⊤(b−x^a)=0
\mathbf{a}^\top(\mathbf{b} - \hat{x}\mathbf{a}) = 0
x^=a⊤aa⊤b
\hat{x} = \frac{\mathbf{a}^\top\mathbf{b}}{\mathbf{a}^\top\mathbf{a}}
p=a⊤aaa⊤b
\mathbf{p} = \frac{\mathbf{a}\mathbf{a}^\top}{\mathbf{a}^\top\mathbf{a}} \mathbf{b}
Recap
A⊤(b−Ax^)=0
A^\top(\mathbf{b} - A\mathbf{\hat{x}}) = \mathbf{0}
A⊤Ax^=A⊤b
A^\top A\mathbf{\hat{x}} = A^\top\mathbf{b}
x^=(A⊤A)−1A⊤b
\mathbf{\hat{x}} = (A^\top A)^{-1}A^\top\mathbf{b}
p=Ax^
\mathbf{p} = A\mathbf{\hat{x}}
p=A(A⊤A)−1A⊤b
\mathbf{p} = A(A^\top A)^{-1}A^\top\mathbf{b}
P=A(A⊤A)−1A⊤
P = A(A^\top A)^{-1}A^\top
How do I know this inverse exists?
The invertibility of A⊤A
Theorem: If A has n independent columns then A⊤A is invertible
Proof: HW4
We can rely on this result because we have assumed A contains independent columns
What if that is not the case?
Just do GE and retain independent columns
(we anyways set the dependent variables to 0 while finding x particular)
(There is another alternative which you will see soon)
Properties of P
P=A(A⊤A)−1A⊤
P = A(A^\top A)^{-1}A^\top
P⊤=P
P^\top = P
P2=A(A⊤A)−1A⊤
P^2 = A(A^\top A)^{-1}A^\top
A(A⊤A)−1A⊤
A(A^\top A)^{-1}A^\top
P2=A(A⊤A)−1A⊤=P
P^2 = A(A^\top A)^{-1}A^\top = P
HW4: What if A is a full rank square matrix?
Back to our ML example
Yesterday
How many COVID19 cases would be there tomorrow?
Loc. 1
Loc. 2
Loc. 3
Loc. m
Tomorrow
a11a21a31⋯⋯am1 a12a22a32⋯⋯am2
\begin{bmatrix}
a_{11}&~&a_{12}\\
a_{21}&~&a_{22}\\
a_{31}&~&a_{32}\\
\cdots&~&\cdots\\
\cdots&~&\cdots\\
a_{m1}&~&a_{m2}\\
\end{bmatrix}
b1b2b3⋯⋯bm
\begin{bmatrix}
b_{1}\\
b_{2}\\
b_{3}\\
\cdots\\
\cdots\\
b_{m}\\
\end{bmatrix}
You know that some relation exists between '# of cases' and the 2 variables
But you don't know what f is!
b1=f(a11,a12)
b_{1} = f(a_{11}, a_{12})
We will just assume that it is linear
(a toy example)
Today
(In practice, there would be many more variables but for simplicity and ease of visualisation we consider only 2 variables)
[x1x2]
\begin{bmatrix}
x_{1}\\
x_{2}
\end{bmatrix}
=
=
b=x1∗ayesterday+x2∗atoday
b = x_1*a_{yesterday} + x_2*a_{today}
Back to our ML example
Yesterday
Tomorrow
132323 220113
\begin{bmatrix}
1&~&2\\
3&~&2\\
2&~&0\\
3&~&1\\
2&~&1\\
3&~&3\\
\end{bmatrix}
(a toy example)
Today
[x1x2]
\begin{bmatrix}
x_{1}\\
x_{2}
\end{bmatrix}
=
=
22.251.2521.752.75
\begin{bmatrix}
2\\
2.25\\
1.25\\
2\\
1.75\\
2.75\\
\end{bmatrix}
Loc. 1
Loc. 2
Loc. 3
Loc. 6
b=x1∗ayesterday+x2∗atoday
b = x_1*a_{yesterday} + x_2*a_{today}
Does the above system of equations have a solution?
Practice Problem: Perform Gaussian Elimination and check whether the above system of equations has a solution
It does not but still verify it!
So what do we do?
b is not in the column space of A
132323 220113
\begin{bmatrix}
1&~&2\\
3&~&2\\
2&~&0\\
3&~&1\\
2&~&1\\
3&~&3\\
\end{bmatrix}
[x1x2]
\begin{bmatrix}
x_{1}\\
x_{2}
\end{bmatrix}
=
=
A
A
x
\mathbf{x}
b
\mathbf{b}
Find its projection p
Solve Ax^=p
Recap
A⊤Ax^=A⊤b
A^\top A\mathbf{\hat{x}} = A^\top\mathbf{b}
x^=(A⊤A)−1A⊤b
\mathbf{\hat{x}} = (A^\top A)^{-1}A^\top\mathbf{b}
22.251.2521.752.75
\begin{bmatrix}
2\\
2.25\\
1.25\\
2\\
1.75\\
2.75\\
\end{bmatrix}
column space of A
a1∈R6
\mathbf{a}_1\in \mathbb{R}^6
a2∈R6
\mathbf{a}_2\in \mathbb{R}^6
C(A) is a 2d plane in a 6 dimensional space
b∈R6
\mathbf{b}\in \mathbb{R}^6
p
\mathbf{p}
Finding x^
132323 220113
\begin{bmatrix}
1&~&2\\
3&~&2\\
2&~&0\\
3&~&1\\
2&~&1\\
3&~&3\\
\end{bmatrix}
[x1x2]
\begin{bmatrix}
x_{1}\\
x_{2}
\end{bmatrix}
=
=
A⊤
A^\top
A
A
x^
\mathbf{\hat{x}}
Recap
A⊤Ax^=A⊤b
A^\top A\mathbf{\hat{x}} = A^\top\mathbf{b}
[123220312133]
\begin{bmatrix}
1&3&2&3&2&3\\
2&2&0&1&1&3
\end{bmatrix}
A⊤
A^\top
[123220312133]
\begin{bmatrix}
1&3&2&3&2&3\\
2&2&0&1&1&3
\end{bmatrix}
b
\mathbf{b}
[36222219]
\begin{bmatrix}
36&22\\
22&19
\end{bmatrix}
[x1x2]
\begin{bmatrix}
x_{1}\\
x_{2}
\end{bmatrix}
x^
\mathbf{\hat{x}}
=[2920.5]
=\begin{bmatrix}
29\\
20.5
\end{bmatrix}
22.251.2521.752.75
\begin{bmatrix}
2\\
2.25\\
1.25\\
2\\
1.75\\
2.75\\
\end{bmatrix}
x1=0.5x2=0.5
x_{1}=0.5\\
x_{2}=0.5
The two geometric views
b=0.5∗ayesterday+0.5∗atoday
b = 0.5*a_{yesterday} + 0.5*a_{today}
b
b
ayesterday
a_{yesterday}
atoday
a_{today}
column space of A
a1∈R6
\mathbf{a}_1\in \mathbb{R}^6
a2∈R6
\mathbf{a}_2\in \mathbb{R}^6
b∈R6
\mathbf{b}\in \mathbb{R}^6
p
\mathbf{p}

b is a function of the inputs
We have assumed this function to be linear
What is the geometric interpretation?
(switch to geogebra)
One method, many names
We are "fitting a line/plane"
We are doing "linear regression"
We are finding the "least squares" solution
(How?)
minp∣∣b−p∣∣22
\min_{\mathbf{p}} ||\mathbf{b} - \mathbf{p}||_2^2
minx^∣∣b−Ax^∣∣22
\min_{\mathbf{\hat{x}}} ||\mathbf{b} - A\mathbf{\hat{x}}||_2^2
b1b2b3⋯bm
\begin{bmatrix}
b_1\\
b_2\\
b_3\\
\cdots\\
b_m\\
\end{bmatrix}
p1p2p3⋯pm
\begin{bmatrix}
p_1\\
p_2\\
p_3\\
\cdots\\
p_m\\
\end{bmatrix}
b1−p1b2−p2b3−p3⋯bm−pm
\begin{bmatrix}
b_1-p_1\\
b_2-p_2\\
b_3-p_3\\
\cdots\\
b_m-p_m\\
\end{bmatrix}
b−p
\mathbf{b}-\mathbf{p}
minx^(b−Ax^)⊤(b−Ax^)
\min_{\mathbf{\hat{x}}} (\mathbf{b} - A\mathbf{\hat{x}})^\top(\mathbf{b} - A\mathbf{\hat{x}})
minx^(b⊤−x^⊤A⊤)(b−Ax^)
\min_{\mathbf{\hat{x}}} (\mathbf{b}^\top - \mathbf{\hat{x}}^\top A^\top)(\mathbf{b} - A\mathbf{\hat{x}})
minx^(b⊤−x^⊤A⊤)(b−Ax^)
\min_{\mathbf{\hat{x}}} (\mathbf{b}^\top - \mathbf{\hat{x}}^\top A^\top)(\mathbf{b} - A\mathbf{\hat{x}})
minx^(b⊤b−x^⊤A⊤b−b⊤Ax^+x⊤A⊤Ax^)
\min_{\mathbf{\hat{x}}} (\mathbf{b}^\top\mathbf{b} - \mathbf{\hat{x}}^\top A^\top\mathbf{b} - \mathbf{b}^\top A\mathbf{\hat{x}} + \mathbf{x}^\top A^\top A\mathbf{\hat{x}})
Take derivative w.r.t x^ and set to 0
2A⊤b−2A⊤Ax^=0
2A^\top\mathbf{b} - 2A^\top A\mathbf{\hat{x}} = 0
⟹A⊤Ax^=A⊤b
\implies A^\top A\mathbf{\hat{x}} = A^\top\mathbf{b}

Learning Objectives
(achieved)
How do you project a vector onto another vector?
How do you project a vector onto a subspace?
What is "linear regression" or "least squares" method?
CS6015: Linear Algebra and Random Processes Lecture 12: Projecting a vector onto another vector, Projecting a vector on to a subspace, Linear Regression (Least Squares)
CS6015: Lecture 12
By Mitesh Khapra
CS6015: Lecture 12
Lecture 12: Projecting a vector onto another vector, Projecting a vector on to a subspace, Solving Ax=b when no solution exists
- 2,959