CS6015: Linear Algebra and Random Processes
Lecture 13: Orthonormal vectors, orthonormal basis, Gram-Schmidt orthogonalization, QR factorisation
Learning Objectives
What are orthonormal vectors?
(for today's lecture)
What is an orthonormal basis?
How do you create an orthonormal basis (Gram-Schmidt process)?
What is QR factorisation?
Orthonormal vectors
q_i^\top q_j =
\begin{cases}
0~~if~~i\neq j\\
1~~if~~i = j\\
\end{cases}
Vectors \(q_1, q_2, \dots, q_n\) are said to be orthonormal if
If \(Q\) is a matrix whose columns are orthonormal then
Q = \begin{bmatrix}
\uparrow&\uparrow&\uparrow&\uparrow&\\
q_1&q_2&\dots&q_n\\
\downarrow&\downarrow&\downarrow&\downarrow&\\
\end{bmatrix}
Q^\top Q = \begin{bmatrix}
\leftarrow&q_1^\top&\rightarrow\\
\leftarrow&q_2^\top&\rightarrow\\
\leftarrow&\dots&\rightarrow\\
\leftarrow&q_n^\top&\rightarrow\\
\end{bmatrix}
\begin{bmatrix}
\uparrow&\uparrow&\uparrow&\uparrow&\\
q_1&q_2&\dots&q_n\\
\downarrow&\downarrow&\downarrow&\downarrow&\\
\end{bmatrix}
=I
Why?
because the \(i,j\)-th entry of \(Q^\top Q \) will be \(q_i^\top q_j \)
Orthogonal matrix
If \(Q\) is a square matrix whose columns are orthonormal then it is called an orthogonal matrix
Q^\top Q = QQ^\top = I
(for a square matrix the left inverse is equal to the right inverse)
Permutation
Rotation
\begin{bmatrix}
0&1&0\\
1&0&0\\
0&0&1\\
\end{bmatrix}
\begin{bmatrix}
cos \theta&-sin \theta\\
sin \theta& cos \theta
\end{bmatrix}
\begin{bmatrix}
1 & 1\\
1 & -1
\end{bmatrix}
\begin{bmatrix}
0&1&0\\
1&0&0\\
0&0&1\\
0&0&0\\
\end{bmatrix}
Rectangular Matrix
\begin{bmatrix}
0&1&0&0\\
1&0&0&0\\
0&0&1&0\\
\end{bmatrix}
QQ^\top \neq I
= \begin{bmatrix}
1&0&0&0\\
0&1&0&0\\
0&0&1&0\\
0&0&0&0\\
\end{bmatrix}
(Orthonormal columns)
Q^\top Q = I
Q
Q^\top
\frac{1}{\sqrt{2}}
\begin{bmatrix}
1 & 2 & -2\\
2 & 1 & 2\\
2 & -2 & -1
\end{bmatrix}
\frac{1}{3}
Why do we care? (about orthogonal matrices)
If \(A = Q \) (a matrix with orthonormal columns)
Q^\top Q\mathbf{\hat{x}} = Q^\top\mathbf{b}
\mathbf{\hat{x}} = Q^\top\mathbf{b}
\begin{bmatrix}
\hat{x_1}\\
\hat{x_2}\\
\dots\\
\hat{x_n}\\
\end{bmatrix} =
\begin{bmatrix}
\leftarrow&\mathbf{q}_1^\top&\rightarrow\\
\leftarrow&\mathbf{q}_2^\top&\rightarrow\\
\dots\\
\leftarrow&\mathbf{q}_n^\top&\rightarrow\\
\end{bmatrix}
\mathbf{b}
\mathbf{p} = \hat{x_1}\mathbf{q}_1 + \hat{x_2}\mathbf{q}_2 + \dots + \hat{x_n}\mathbf{q}_n
\hat{x_i} = \mathbf{q}_i^\top\mathbf{b}
column space of A
\mathbf{q}_1\in \mathbb{R}^6
\mathbf{q}_2\in \mathbb{R}^6
\mathbf{b}\in \mathbb{R}^6
\mathbf{p}
Recap
A^\top A\mathbf{\hat{x}} = A^\top\mathbf{b}
\mathbf{\hat{x}} = (A^\top A)^{-1}A^\top\mathbf{b}
Why do we care? (about orthogonal matrices)
Recap
A^\top A\mathbf{\hat{x}} = A^\top\mathbf{b}
\mathbf{\hat{x}} = (A^\top A)^{-1}A^\top\mathbf{b}
If \(A = Q \) (a matrix with orthonormal columns)
column space of A
\mathbf{q}_1\in \mathbb{R}^6
\mathbf{q}_2\in \mathbb{R}^6
\mathbf{b}\in \mathbb{R}^6
\mathbf{p}
\mathbf{p} = \hat{x_1}\mathbf{q}_1 + \hat{x_2}\mathbf{q}_2 + \dots + \hat{x_n}\mathbf{q}_n
\hat{x_i} = \mathbf{q}_i^\top\mathbf{b}
The co-ordinate of the projection of \(\mathbf{b} \) along each basis vector is simply the dot product of that basis vector with \(\mathbf{b} \)
(as opposed to the complicated formula you see in this box)
\(\rightarrow\)
An orthonormal basis is the best basis you can hope for!
What if the basis is not orthonormal?
Issue: The columns of \(A\) may not be orthonormal
Wishlist: We want an orthonormal basis!
Question: Can we start from some non-orthonormal basis and derive an orthonormal one?
Consequence: The basis vectors for the column space that we get from the pivot columns may not be orthonormal
Observation: We know that multiple basis exist for the same subspace
Answer: Yes, by using Gram-Schmidt process
Gram-Schmidt Process
Given: non-orthonormal vectors \(\mathbf{a}_1, \mathbf{a}_2, \dots, \mathbf{a}_n\)
Step 1: get orthogonal vectors \(\hat{\mathbf{a}}_1, \hat{\mathbf{a}}_2, \dots \hat{\mathbf{a}}_n\)
Step 2: get orthonormal vectors \(\mathbf{q}_1, \mathbf{q}_2, \dots \mathbf{q}_n\)
Step 2 is easy (we will not focus too much on it)
\mathbf{q}_i = \frac{\hat{\mathbf{a}}_i}{||\hat{\mathbf{a}}_i||_2}
\mathbf{a}_1
\mathbf{a}_2
\hat{\mathbf{a}}_1
\hat{\mathbf{a}}_2
\mathbf{q}_1
\mathbf{q}_2
Gram-Schmidt Process
\mathbf{a}_1
\mathbf{a}_2
\mathbf{p}
\mathbf{e}
\(\mathbf{p} \) is the component of \(\mathbf{a}_2\) along \(\mathbf{a}_1\)
\(\mathbf{e} \) is the component of \(\mathbf{a}_2\) orthogonal to \(\mathbf{a}_1\)
this is what we want to get rid of
this is what we want to retain
We will retain \(\mathbf{a}_1 \) as the first basis vector
\hat{\mathbf{a}}_1= \mathbf{a}_1
\hat{\mathbf{a}}_2= \mathbf{e} = \mathbf{a}_2 - {\mathbf{p}}
\therefore \hat{\mathbf{a}}_2 = \mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
Gram-Schmidt Process
\mathbf{a}_1
\mathbf{a}_2
\mathbf{p}
\mathbf{e}
We want to get rid of the component of \(\mathbf{a}_3\) along \(\mathbf{a}_1\)
(just as we did for a2)
(first basis vector)
\hat{\mathbf{a}}_1= \mathbf{a}_1
\hat{\mathbf{a}}_2 = \mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
(second basis vector)
\hat{\mathbf{a}}_3 = \mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
We also want to get rid of the component of \(\mathbf{a}_3\) along \(\mathbf{a}_2\)
(because we want a3 to be orthogonal to a2 also)
- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}_2}
Gram-Schmidt Process
\mathbf{a}_1=\begin{bmatrix}
1\\
-1\\
0
\end{bmatrix}
\hat{\mathbf{a}}_2 = \mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
\hat{\mathbf{a}}_3 = \mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}_2}
(Example)
\mathbf{a}_2=\begin{bmatrix}
2\\
0\\
-2
\end{bmatrix}
\mathbf{a}_3=\begin{bmatrix}
3\\
-3\\
3
\end{bmatrix}
\hat{\mathbf{a}}_1 = \mathbf{a}_1=\begin{bmatrix}
1\\
-1\\
0
\end{bmatrix}
=\begin{bmatrix}
2\\
0\\
-2
\end{bmatrix}
-\frac{2}{2}\begin{bmatrix}
1\\
-1\\
0
\end{bmatrix}
=\begin{bmatrix}
1\\
1\\
-2
\end{bmatrix}
=\begin{bmatrix}
3\\
-3\\
3
\end{bmatrix}
-\frac{6}{2}\begin{bmatrix}
1\\
-1\\
0
\end{bmatrix}
-\frac{-6}{6}\begin{bmatrix}
1\\
1\\
-2
\end{bmatrix}
=\begin{bmatrix}
1\\1\\1
\end{bmatrix}
Are we sure they are orthogonal?
\hat{\mathbf{a}}_2 = \mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
\hat{\mathbf{a}}_3 = \mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}_2}
Multiply by \(\hat{\mathbf{a}}_1 ^\top\) on both sides
\hat{\mathbf{a}}_1^\top\hat{\mathbf{a}}_2 = \hat{\mathbf{a}}_1^\top\mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}}_1^\top\hat{\mathbf{a}_1}
=0
Multiply by \(\hat{\mathbf{a}}_1 ^\top\) on both sides
\hat{\mathbf{a}}_1^\top\hat{\mathbf{a}}_3 = \hat{\mathbf{a}}_1^\top\mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}}_1^\top\hat{\mathbf{a}_1}
- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}}_1^\top\hat{\mathbf{a}_2}
0
=0
Multiply by \(\hat{\mathbf{a}}_2 ^\top\) on both sides
\therefore \hat{\mathbf{a}}_1 \perp \hat{\mathbf{a}}_2
\therefore \hat{\mathbf{a}}_1 \perp \hat{\mathbf{a}}_3
\therefore \hat{\mathbf{a}}_2 \perp \hat{\mathbf{a}}_3
\hat{\mathbf{a}}_2^\top\hat{\mathbf{a}}_3 = \hat{\mathbf{a}}_2^\top\mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}}_2^\top\hat{\mathbf{a}_1}
- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}}_2^\top\hat{\mathbf{a}_2}
0
=0
QR factorisation
\mathbf{a}_1
\mathbf{q}_2
\mathbf{q_1}
\mathbf{a_1} = z_{11}\mathbf{q_1} + z_{12}\mathbf{q_2} + z_{13}\mathbf{q_3}
Recap: The co-ordinate of the projection of \(\mathbf{a_1} \) along each orthonormal basis vector is simply the dot product of that basis vector with \(\mathbf{a_1} \)
\mathbf{a}_2
z_{11} = \mathbf{q_1}^\top\mathbf{a_1}
\mathbf{a_2} = z_{21}\mathbf{q_1} + z_{22}\mathbf{q_2} + z_{23}\mathbf{q_3}
z_{12} = \mathbf{q_2}^\top\mathbf{a_1} = 0
z_{21} = \mathbf{q_1}^\top\mathbf{a_2}
z_{22} = \mathbf{q_2}^\top\mathbf{a_2}
\mathbf{a}_3
\mathbf{q_3}
\mathbf{a_3} = z_{31}\mathbf{q_1} + z_{32}\mathbf{q_2} + z_{33}\mathbf{q_3}
\mathbf{\hat{x}} = (A^\top A)^{-1}A^\top\mathbf{b}
\mathbf{\hat{x}} = Q^\top\mathbf{b}
z_{13} = \mathbf{q_3}^\top\mathbf{a_1} = 0
z_{23} = \mathbf{q_3}^\top\mathbf{a_2} = 0
\hat{\mathbf{a}}_2 = \mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
z_{31} = \mathbf{q_1}^\top\mathbf{a_3}
z_{32} = \mathbf{q_2}^\top\mathbf{a_3}
z_{33} = \mathbf{q_3}^\top\mathbf{a_3}
\hat{\mathbf{a}}_3 = \mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}_2}
QR factorisation
\mathbf{a_1} = z_{11}\mathbf{q_1} + z_{12}\mathbf{q_2} + z_{13}\mathbf{q_3}
z_{11} = \mathbf{q_1}^\top\mathbf{a_1}
\mathbf{a_2} = z_{21}\mathbf{q_1} + z_{22}\mathbf{q_2} + z_{23}\mathbf{q_3}
z_{12} = \mathbf{q_2}^\top\mathbf{a_1} = 0
z_{21} = \mathbf{q_1}^\top\mathbf{a_2}
z_{22} = \mathbf{q_2}^\top\mathbf{a_2}
\mathbf{a_3} = z_{31}\mathbf{q_1} + z_{32}\mathbf{q_2} + z_{33}\mathbf{q_3}
z_{13} = \mathbf{q_3}^\top\mathbf{a_1} = 0
z_{23} = \mathbf{q_3}^\top\mathbf{a_2} = 0
\hat{\mathbf{a}}_2 = \mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
z_{31} = \mathbf{q_1}^\top\mathbf{a_3}
z_{32} = \mathbf{q_2}^\top\mathbf{a_3}
z_{33} = \mathbf{q_3}^\top\mathbf{a_3}
\hat{\mathbf{a}}_3 = \mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}_2}
\begin{bmatrix}
\mathbf{q_1}^\top\mathbf{a_1}&\mathbf{q_1}^\top\mathbf{a_2}&\mathbf{q_1}^\top\mathbf{a_3}\\
0&\mathbf{q_2}^\top\mathbf{a_2}&\mathbf{q_2}^\top\mathbf{a_3}\\
0&0&\mathbf{q_3}^\top\mathbf{a_3}
\end{bmatrix}
\begin{bmatrix}
\uparrow&\uparrow&\uparrow\\
\mathbf{q_1}&\mathbf{q_2}&\mathbf{q_3}\\
\downarrow&\downarrow&\downarrow\\
\end{bmatrix}
\begin{bmatrix}
\uparrow&\uparrow&\uparrow\\
\mathbf{a_1}&\mathbf{a_2}&\mathbf{a_3}\\
\downarrow&\downarrow&\downarrow\\
\end{bmatrix} =
A = QR
Learning Objectives
(achieved)
What are orthonormal vectors?
What is an orthonormal basis?
How do you create an orthonormal basis (Gram-Schmidt process)?
What is QR factorisation?
CS6015: Lecture 13
By Mitesh Khapra
CS6015: Lecture 13
Lecture 13: Orthonormal vectors, orthonormal basis, Gram-Schmidt orthogonalization
- 2,415