CS6015: Linear Algebra and Random Processes
Lecture 13: Orthonormal vectors, orthonormal basis, Gram-Schmidt orthogonalization, QR factorisation
Learning Objectives
What are orthonormal vectors?
(for today's lecture)
What is an orthonormal basis?
How do you create an orthonormal basis (Gram-Schmidt process)?
What is QR factorisation?
Orthonormal vectors
qi⊤qj={0 if i=j1 if i=j
q_i^\top q_j =
\begin{cases}
0~~if~~i\neq j\\
1~~if~~i = j\\
\end{cases}
Vectors q1,q2,…,qn are said to be orthonormal if

If Q is a matrix whose columns are orthonormal then
Q=↑q1↓↑q2↓↑…↓↑qn↓
Q = \begin{bmatrix}
\uparrow&\uparrow&\uparrow&\uparrow&\\
q_1&q_2&\dots&q_n\\
\downarrow&\downarrow&\downarrow&\downarrow&\\
\end{bmatrix}
Q⊤Q=←←←←q1⊤q2⊤…qn⊤→→→→↑q1↓↑q2↓↑…↓↑qn↓
Q^\top Q = \begin{bmatrix}
\leftarrow&q_1^\top&\rightarrow\\
\leftarrow&q_2^\top&\rightarrow\\
\leftarrow&\dots&\rightarrow\\
\leftarrow&q_n^\top&\rightarrow\\
\end{bmatrix}
\begin{bmatrix}
\uparrow&\uparrow&\uparrow&\uparrow&\\
q_1&q_2&\dots&q_n\\
\downarrow&\downarrow&\downarrow&\downarrow&\\
\end{bmatrix}
=I
=I
Why?
because the i,j-th entry of Q⊤Q will be qi⊤qj
Orthogonal matrix
If Q is a square matrix whose columns are orthonormal then it is called an orthogonal matrix

Q⊤Q=QQ⊤=I
Q^\top Q = QQ^\top = I
(for a square matrix the left inverse is equal to the right inverse)
Permutation
Rotation
010100001
\begin{bmatrix}
0&1&0\\
1&0&0\\
0&0&1\\
\end{bmatrix}
[cosθsinθ−sinθcosθ]
\begin{bmatrix}
cos \theta&-sin \theta\\
sin \theta& cos \theta
\end{bmatrix}
[111−1]
\begin{bmatrix}
1 & 1\\
1 & -1
\end{bmatrix}
010010000010
\begin{bmatrix}
0&1&0\\
1&0&0\\
0&0&1\\
0&0&0\\
\end{bmatrix}
Rectangular Matrix
010100001000
\begin{bmatrix}
0&1&0&0\\
1&0&0&0\\
0&0&1&0\\
\end{bmatrix}
QQ⊤=I
QQ^\top \neq I
=1000010000100000
= \begin{bmatrix}
1&0&0&0\\
0&1&0&0\\
0&0&1&0\\
0&0&0&0\\
\end{bmatrix}
(Orthonormal columns)
Q⊤Q=I
Q^\top Q = I
Q
Q
Q⊤
Q^\top
21
\frac{1}{\sqrt{2}}
12221−2−22−1
\begin{bmatrix}
1 & 2 & -2\\
2 & 1 & 2\\
2 & -2 & -1
\end{bmatrix}
31
\frac{1}{3}
Why do we care? (about orthogonal matrices)
If A=Q (a matrix with orthonormal columns)
Q⊤Qx^=Q⊤b
Q^\top Q\mathbf{\hat{x}} = Q^\top\mathbf{b}
x^=Q⊤b
\mathbf{\hat{x}} = Q^\top\mathbf{b}
x1^x2^…xn^=←←…←q1⊤q2⊤qn⊤→→→
\begin{bmatrix}
\hat{x_1}\\
\hat{x_2}\\
\dots\\
\hat{x_n}\\
\end{bmatrix} =
\begin{bmatrix}
\leftarrow&\mathbf{q}_1^\top&\rightarrow\\
\leftarrow&\mathbf{q}_2^\top&\rightarrow\\
\dots\\
\leftarrow&\mathbf{q}_n^\top&\rightarrow\\
\end{bmatrix}
b
\mathbf{b}
p=x1^q1+x2^q2+⋯+xn^qn
\mathbf{p} = \hat{x_1}\mathbf{q}_1 + \hat{x_2}\mathbf{q}_2 + \dots + \hat{x_n}\mathbf{q}_n
xi^=qi⊤b
\hat{x_i} = \mathbf{q}_i^\top\mathbf{b}
column space of A
q1∈R6
\mathbf{q}_1\in \mathbb{R}^6
q2∈R6
\mathbf{q}_2\in \mathbb{R}^6
b∈R6
\mathbf{b}\in \mathbb{R}^6
p
\mathbf{p}
Recap
A⊤Ax^=A⊤b
A^\top A\mathbf{\hat{x}} = A^\top\mathbf{b}
x^=(A⊤A)−1A⊤b
\mathbf{\hat{x}} = (A^\top A)^{-1}A^\top\mathbf{b}
Why do we care? (about orthogonal matrices)
Recap
A⊤Ax^=A⊤b
A^\top A\mathbf{\hat{x}} = A^\top\mathbf{b}
x^=(A⊤A)−1A⊤b
\mathbf{\hat{x}} = (A^\top A)^{-1}A^\top\mathbf{b}
If A=Q (a matrix with orthonormal columns)
column space of A
q1∈R6
\mathbf{q}_1\in \mathbb{R}^6
q2∈R6
\mathbf{q}_2\in \mathbb{R}^6
b∈R6
\mathbf{b}\in \mathbb{R}^6
p
\mathbf{p}
p=x1^q1+x2^q2+⋯+xn^qn
\mathbf{p} = \hat{x_1}\mathbf{q}_1 + \hat{x_2}\mathbf{q}_2 + \dots + \hat{x_n}\mathbf{q}_n
xi^=qi⊤b
\hat{x_i} = \mathbf{q}_i^\top\mathbf{b}
The co-ordinate of the projection of b along each basis vector is simply the dot product of that basis vector with b
(as opposed to the complicated formula you see in this box)
→
An orthonormal basis is the best basis you can hope for!
What if the basis is not orthonormal?
Issue: The columns of A may not be orthonormal
Wishlist: We want an orthonormal basis!
Question: Can we start from some non-orthonormal basis and derive an orthonormal one?
Consequence: The basis vectors for the column space that we get from the pivot columns may not be orthonormal
Observation: We know that multiple basis exist for the same subspace
Answer: Yes, by using Gram-Schmidt process
Gram-Schmidt Process
Given: non-orthonormal vectors a1,a2,…,an
Step 1: get orthogonal vectors a^1,a^2,…a^n
Step 2: get orthonormal vectors q1,q2,…qn
Step 2 is easy (we will not focus too much on it)
qi=∣∣a^i∣∣2a^i
\mathbf{q}_i = \frac{\hat{\mathbf{a}}_i}{||\hat{\mathbf{a}}_i||_2}
a1
\mathbf{a}_1
a2
\mathbf{a}_2
a^1
\hat{\mathbf{a}}_1
a^2
\hat{\mathbf{a}}_2
q1
\mathbf{q}_1
q2
\mathbf{q}_2
Gram-Schmidt Process
a1
\mathbf{a}_1
a2
\mathbf{a}_2
p
\mathbf{p}
e
\mathbf{e}
p is the component of a2 along a1
e is the component of a2 orthogonal to a1
this is what we want to get rid of
this is what we want to retain
We will retain a1 as the first basis vector
a^1=a1
\hat{\mathbf{a}}_1= \mathbf{a}_1
a^2=e=a2−p
\hat{\mathbf{a}}_2= \mathbf{e} = \mathbf{a}_2 - {\mathbf{p}}
∴a^2=a2−a1^⊤a1^a1^⊤a2a1^
\therefore \hat{\mathbf{a}}_2 = \mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}

Gram-Schmidt Process
a1
\mathbf{a}_1
a2
\mathbf{a}_2
p
\mathbf{p}
e
\mathbf{e}
We want to get rid of the component of a3 along a1
(just as we did for a2)
(first basis vector)
a^1=a1
\hat{\mathbf{a}}_1= \mathbf{a}_1
a^2=a2−a1^⊤a1^a1^⊤a2a1^
\hat{\mathbf{a}}_2 = \mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
(second basis vector)
a^3=a3−a1^⊤a1^a1^⊤a3a1^
\hat{\mathbf{a}}_3 = \mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
We also want to get rid of the component of a3 along a2
(because we want a3 to be orthogonal to a2 also)
−a2^⊤a2^a2^⊤a3a2^
- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}_2}

Gram-Schmidt Process
a1=1−10
\mathbf{a}_1=\begin{bmatrix}
1\\
-1\\
0
\end{bmatrix}
a^2=a2−a1^⊤a1^a1^⊤a2a1^
\hat{\mathbf{a}}_2 = \mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
a^3=a3−a1^⊤a1^a1^⊤a3a1^
\hat{\mathbf{a}}_3 = \mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
−a2^⊤a2^a2^⊤a3a2^
- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}_2}
(Example)
a2=20−2
\mathbf{a}_2=\begin{bmatrix}
2\\
0\\
-2
\end{bmatrix}
a3=3−33
\mathbf{a}_3=\begin{bmatrix}
3\\
-3\\
3
\end{bmatrix}
a^1=a1=1−10
\hat{\mathbf{a}}_1 = \mathbf{a}_1=\begin{bmatrix}
1\\
-1\\
0
\end{bmatrix}
=20−2
=\begin{bmatrix}
2\\
0\\
-2
\end{bmatrix}
−221−10
-\frac{2}{2}\begin{bmatrix}
1\\
-1\\
0
\end{bmatrix}
=11−2
=\begin{bmatrix}
1\\
1\\
-2
\end{bmatrix}
=3−33
=\begin{bmatrix}
3\\
-3\\
3
\end{bmatrix}
−261−10
-\frac{6}{2}\begin{bmatrix}
1\\
-1\\
0
\end{bmatrix}
−6−611−2
-\frac{-6}{6}\begin{bmatrix}
1\\
1\\
-2
\end{bmatrix}
=111
=\begin{bmatrix}
1\\1\\1
\end{bmatrix}
Are we sure they are orthogonal?
a^2=a2−a1^⊤a1^a1^⊤a2a1^
\hat{\mathbf{a}}_2 = \mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
a^3=a3−a1^⊤a1^a1^⊤a3a1^
\hat{\mathbf{a}}_3 = \mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
−a2^⊤a2^a2^⊤a3a2^
- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}_2}
Multiply by a^1⊤ on both sides
a^1⊤a^2=a^1⊤a2−a1^⊤a1^a1^⊤a2a^1⊤a1^
\hat{\mathbf{a}}_1^\top\hat{\mathbf{a}}_2 = \hat{\mathbf{a}}_1^\top\mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}}_1^\top\hat{\mathbf{a}_1}
=0
=0
Multiply by a^1⊤ on both sides
a^1⊤a^3=a^1⊤a3−a1^⊤a1^a1^⊤a3a^1⊤a1^
\hat{\mathbf{a}}_1^\top\hat{\mathbf{a}}_3 = \hat{\mathbf{a}}_1^\top\mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}}_1^\top\hat{\mathbf{a}_1}
−a2^⊤a2^a2^⊤a3a^1⊤a2^
- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}}_1^\top\hat{\mathbf{a}_2}
0
0
=0
=0
Multiply by a^2⊤ on both sides
∴a^1⊥a^2
\therefore \hat{\mathbf{a}}_1 \perp \hat{\mathbf{a}}_2
∴a^1⊥a^3
\therefore \hat{\mathbf{a}}_1 \perp \hat{\mathbf{a}}_3
∴a^2⊥a^3
\therefore \hat{\mathbf{a}}_2 \perp \hat{\mathbf{a}}_3
a^2⊤a^3=a^2⊤a3−a1^⊤a1^a1^⊤a3a^2⊤a1^
\hat{\mathbf{a}}_2^\top\hat{\mathbf{a}}_3 = \hat{\mathbf{a}}_2^\top\mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}}_2^\top\hat{\mathbf{a}_1}
−a2^⊤a2^a2^⊤a3a^2⊤a2^
- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}}_2^\top\hat{\mathbf{a}_2}
0
0
=0
=0
QR factorisation
a1
\mathbf{a}_1
q2
\mathbf{q}_2
q1
\mathbf{q_1}
a1=z11q1+z12q2+z13q3
\mathbf{a_1} = z_{11}\mathbf{q_1} + z_{12}\mathbf{q_2} + z_{13}\mathbf{q_3}
Recap: The co-ordinate of the projection of a1 along each orthonormal basis vector is simply the dot product of that basis vector with a1
a2
\mathbf{a}_2
z11=q1⊤a1
z_{11} = \mathbf{q_1}^\top\mathbf{a_1}
a2=z21q1+z22q2+z23q3
\mathbf{a_2} = z_{21}\mathbf{q_1} + z_{22}\mathbf{q_2} + z_{23}\mathbf{q_3}
z12=q2⊤a1=0
z_{12} = \mathbf{q_2}^\top\mathbf{a_1} = 0
z21=q1⊤a2
z_{21} = \mathbf{q_1}^\top\mathbf{a_2}
z22=q2⊤a2
z_{22} = \mathbf{q_2}^\top\mathbf{a_2}
a3
\mathbf{a}_3
q3
\mathbf{q_3}
a3=z31q1+z32q2+z33q3
\mathbf{a_3} = z_{31}\mathbf{q_1} + z_{32}\mathbf{q_2} + z_{33}\mathbf{q_3}
x^=(A⊤A)−1A⊤b
\mathbf{\hat{x}} = (A^\top A)^{-1}A^\top\mathbf{b}
x^=Q⊤b
\mathbf{\hat{x}} = Q^\top\mathbf{b}


z13=q3⊤a1=0
z_{13} = \mathbf{q_3}^\top\mathbf{a_1} = 0
z23=q3⊤a2=0
z_{23} = \mathbf{q_3}^\top\mathbf{a_2} = 0
a^2=a2−a1^⊤a1^a1^⊤a2a1^
\hat{\mathbf{a}}_2 = \mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
z31=q1⊤a3
z_{31} = \mathbf{q_1}^\top\mathbf{a_3}
z32=q2⊤a3
z_{32} = \mathbf{q_2}^\top\mathbf{a_3}
z33=q3⊤a3
z_{33} = \mathbf{q_3}^\top\mathbf{a_3}
a^3=a3−a1^⊤a1^a1^⊤a3a1^−a2^⊤a2^a2^⊤a3a2^
\hat{\mathbf{a}}_3 = \mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}_2}
QR factorisation
a1=z11q1+z12q2+z13q3
\mathbf{a_1} = z_{11}\mathbf{q_1} + z_{12}\mathbf{q_2} + z_{13}\mathbf{q_3}
z11=q1⊤a1
z_{11} = \mathbf{q_1}^\top\mathbf{a_1}
a2=z21q1+z22q2+z23q3
\mathbf{a_2} = z_{21}\mathbf{q_1} + z_{22}\mathbf{q_2} + z_{23}\mathbf{q_3}
z12=q2⊤a1=0
z_{12} = \mathbf{q_2}^\top\mathbf{a_1} = 0
z21=q1⊤a2
z_{21} = \mathbf{q_1}^\top\mathbf{a_2}
z22=q2⊤a2
z_{22} = \mathbf{q_2}^\top\mathbf{a_2}
a3=z31q1+z32q2+z33q3
\mathbf{a_3} = z_{31}\mathbf{q_1} + z_{32}\mathbf{q_2} + z_{33}\mathbf{q_3}
z13=q3⊤a1=0
z_{13} = \mathbf{q_3}^\top\mathbf{a_1} = 0
z23=q3⊤a2=0
z_{23} = \mathbf{q_3}^\top\mathbf{a_2} = 0
a^2=a2−a1^⊤a1^a1^⊤a2a1^
\hat{\mathbf{a}}_2 = \mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
z31=q1⊤a3
z_{31} = \mathbf{q_1}^\top\mathbf{a_3}
z32=q2⊤a3
z_{32} = \mathbf{q_2}^\top\mathbf{a_3}
z33=q3⊤a3
z_{33} = \mathbf{q_3}^\top\mathbf{a_3}
a^3=a3−a1^⊤a1^a1^⊤a3a1^−a2^⊤a2^a2^⊤a3a2^
\hat{\mathbf{a}}_3 = \mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}_2}
q1⊤a100q1⊤a2q2⊤a20q1⊤a3q2⊤a3q3⊤a3
\begin{bmatrix}
\mathbf{q_1}^\top\mathbf{a_1}&\mathbf{q_1}^\top\mathbf{a_2}&\mathbf{q_1}^\top\mathbf{a_3}\\
0&\mathbf{q_2}^\top\mathbf{a_2}&\mathbf{q_2}^\top\mathbf{a_3}\\
0&0&\mathbf{q_3}^\top\mathbf{a_3}
\end{bmatrix}
↑q1↓↑q2↓↑q3↓
\begin{bmatrix}
\uparrow&\uparrow&\uparrow\\
\mathbf{q_1}&\mathbf{q_2}&\mathbf{q_3}\\
\downarrow&\downarrow&\downarrow\\
\end{bmatrix}
↑a1↓↑a2↓↑a3↓=
\begin{bmatrix}
\uparrow&\uparrow&\uparrow\\
\mathbf{a_1}&\mathbf{a_2}&\mathbf{a_3}\\
\downarrow&\downarrow&\downarrow\\
\end{bmatrix} =
A=QR
A = QR
Learning Objectives
(achieved)
What are orthonormal vectors?
What is an orthonormal basis?
How do you create an orthonormal basis (Gram-Schmidt process)?
What is QR factorisation?
CS6015: Linear Algebra and Random Processes Lecture 13: Orthonormal vectors, orthonormal basis, Gram-Schmidt orthogonalization, QR factorisation
CS6015: Lecture 13
By Mitesh Khapra
CS6015: Lecture 13
Lecture 13: Orthonormal vectors, orthonormal basis, Gram-Schmidt orthogonalization
- 2,576