CS6015: Linear Algebra and Random Processes

Lecture 13:  Orthonormal vectors, orthonormal basis, Gram-Schmidt orthogonalization, QR factorisation

Learning Objectives

What are orthonormal vectors?

(for today's lecture)

What is an orthonormal basis?

How do you create an orthonormal basis (Gram-Schmidt process)?

What is QR factorisation?

Orthonormal vectors

q_i^\top q_j = \begin{cases} 0~~if~~i\neq j\\ 1~~if~~i = j\\ \end{cases}

Vectors \(q_1, q_2, \dots, q_n\) are said to be orthonormal if

If \(Q\) is a matrix whose columns are orthonormal then 

Q = \begin{bmatrix} \uparrow&\uparrow&\uparrow&\uparrow&\\ q_1&q_2&\dots&q_n\\ \downarrow&\downarrow&\downarrow&\downarrow&\\ \end{bmatrix}
Q^\top Q = \begin{bmatrix} \leftarrow&q_1^\top&\rightarrow\\ \leftarrow&q_2^\top&\rightarrow\\ \leftarrow&\dots&\rightarrow\\ \leftarrow&q_n^\top&\rightarrow\\ \end{bmatrix} \begin{bmatrix} \uparrow&\uparrow&\uparrow&\uparrow&\\ q_1&q_2&\dots&q_n\\ \downarrow&\downarrow&\downarrow&\downarrow&\\ \end{bmatrix}
=I

Why?

because the \(i,j\)-th entry of \(Q^\top Q \) will be \(q_i^\top q_j \)

Orthogonal matrix

If \(Q\) is a square matrix whose columns are orthonormal then it is called an orthogonal matrix 

Q^\top Q = QQ^\top = I

(for a square matrix the left inverse is equal to the right inverse)

Permutation

Rotation

\begin{bmatrix} 0&1&0\\ 1&0&0\\ 0&0&1\\ \end{bmatrix}
\begin{bmatrix} cos \theta&-sin \theta\\ sin \theta& cos \theta \end{bmatrix}
\begin{bmatrix} 1 & 1\\ 1 & -1 \end{bmatrix}
\begin{bmatrix} 0&1&0\\ 1&0&0\\ 0&0&1\\ 0&0&0\\ \end{bmatrix}

Rectangular Matrix

\begin{bmatrix} 0&1&0&0\\ 1&0&0&0\\ 0&0&1&0\\ \end{bmatrix}
QQ^\top \neq I
= \begin{bmatrix} 1&0&0&0\\ 0&1&0&0\\ 0&0&1&0\\ 0&0&0&0\\ \end{bmatrix}

(Orthonormal columns)

Q^\top Q = I
Q
Q^\top
\frac{1}{\sqrt{2}}
\begin{bmatrix} 1 & 2 & -2\\ 2 & 1 & 2\\ 2 & -2 & -1 \end{bmatrix}
\frac{1}{3}

Why do we care? (about orthogonal matrices)

If \(A = Q \) (a matrix with orthonormal columns)

Q^\top Q\mathbf{\hat{x}} = Q^\top\mathbf{b}
\mathbf{\hat{x}} = Q^\top\mathbf{b}
\begin{bmatrix} \hat{x_1}\\ \hat{x_2}\\ \dots\\ \hat{x_n}\\ \end{bmatrix} = \begin{bmatrix} \leftarrow&\mathbf{q}_1^\top&\rightarrow\\ \leftarrow&\mathbf{q}_2^\top&\rightarrow\\ \dots\\ \leftarrow&\mathbf{q}_n^\top&\rightarrow\\ \end{bmatrix}
\mathbf{b}
\mathbf{p} = \hat{x_1}\mathbf{q}_1 + \hat{x_2}\mathbf{q}_2 + \dots + \hat{x_n}\mathbf{q}_n
\hat{x_i} = \mathbf{q}_i^\top\mathbf{b}
column space of A
\mathbf{q}_1\in \mathbb{R}^6
\mathbf{q}_2\in \mathbb{R}^6
\mathbf{b}\in \mathbb{R}^6
\mathbf{p}

Recap

A^\top A\mathbf{\hat{x}} = A^\top\mathbf{b}
\mathbf{\hat{x}} = (A^\top A)^{-1}A^\top\mathbf{b}

Why do we care? (about orthogonal matrices)

Recap

A^\top A\mathbf{\hat{x}} = A^\top\mathbf{b}
\mathbf{\hat{x}} = (A^\top A)^{-1}A^\top\mathbf{b}

If \(A = Q \) (a matrix with orthonormal columns)

column space of A
\mathbf{q}_1\in \mathbb{R}^6
\mathbf{q}_2\in \mathbb{R}^6
\mathbf{b}\in \mathbb{R}^6
\mathbf{p}
\mathbf{p} = \hat{x_1}\mathbf{q}_1 + \hat{x_2}\mathbf{q}_2 + \dots + \hat{x_n}\mathbf{q}_n
\hat{x_i} = \mathbf{q}_i^\top\mathbf{b}

The co-ordinate of the projection of \(\mathbf{b} \) along each basis vector is simply the dot product of that basis vector with \(\mathbf{b} \)

(as opposed to the complicated formula you see in this box)

\(\rightarrow\)

An orthonormal basis is the best basis you can hope for!

What if the basis is not orthonormal?

Issue: The columns of \(A\) may not be orthonormal 

Wishlist: We want an orthonormal basis!

Question: Can we start from some non-orthonormal basis and derive an orthonormal one?

Consequence: The basis vectors for the column space that we get from the pivot columns may not be orthonormal

Observation: We know that multiple basis exist for the same subspace

Answer: Yes, by using Gram-Schmidt process

Gram-Schmidt Process

Given: non-orthonormal vectors \(\mathbf{a}_1, \mathbf{a}_2, \dots, \mathbf{a}_n\)

Step 1: get orthogonal vectors \(\hat{\mathbf{a}}_1, \hat{\mathbf{a}}_2, \dots \hat{\mathbf{a}}_n\)

Step 2: get orthonormal vectors \(\mathbf{q}_1, \mathbf{q}_2, \dots \mathbf{q}_n\)

Step 2 is easy (we will not focus too much on it)

\mathbf{q}_i = \frac{\hat{\mathbf{a}}_i}{||\hat{\mathbf{a}}_i||_2}
\mathbf{a}_1
\mathbf{a}_2
\hat{\mathbf{a}}_1
\hat{\mathbf{a}}_2
\mathbf{q}_1
\mathbf{q}_2

Gram-Schmidt Process

\mathbf{a}_1
\mathbf{a}_2
\mathbf{p}
\mathbf{e}

\(\mathbf{p} \) is the component of \(\mathbf{a}_2\) along \(\mathbf{a}_1\)

\(\mathbf{e} \) is the component of \(\mathbf{a}_2\) orthogonal to \(\mathbf{a}_1\)

this is what we want to get rid of
this is what we want to retain

We will retain \(\mathbf{a}_1 \) as the first basis vector

\hat{\mathbf{a}}_1= \mathbf{a}_1
\hat{\mathbf{a}}_2= \mathbf{e} = \mathbf{a}_2 - {\mathbf{p}}
\therefore \hat{\mathbf{a}}_2 = \mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}

Gram-Schmidt Process

\mathbf{a}_1
\mathbf{a}_2
\mathbf{p}
\mathbf{e}

We want to get rid of the component of \(\mathbf{a}_3\) along \(\mathbf{a}_1\)

(just as we did for a2)

(first basis vector)

\hat{\mathbf{a}}_1= \mathbf{a}_1
\hat{\mathbf{a}}_2 = \mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}

(second basis vector)

\hat{\mathbf{a}}_3 = \mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}

We also want to get rid of the component of \(\mathbf{a}_3\) along \(\mathbf{a}_2\)

(because we want a3 to be orthogonal to a2 also)
- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}_2}

Gram-Schmidt Process

\mathbf{a}_1=\begin{bmatrix} 1\\ -1\\ 0 \end{bmatrix}
\hat{\mathbf{a}}_2 = \mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
\hat{\mathbf{a}}_3 = \mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}_2}

(Example)

\mathbf{a}_2=\begin{bmatrix} 2\\ 0\\ -2 \end{bmatrix}
\mathbf{a}_3=\begin{bmatrix} 3\\ -3\\ 3 \end{bmatrix}
\hat{\mathbf{a}}_1 = \mathbf{a}_1=\begin{bmatrix} 1\\ -1\\ 0 \end{bmatrix}
=\begin{bmatrix} 2\\ 0\\ -2 \end{bmatrix}
-\frac{2}{2}\begin{bmatrix} 1\\ -1\\ 0 \end{bmatrix}
=\begin{bmatrix} 1\\ 1\\ -2 \end{bmatrix}
=\begin{bmatrix} 3\\ -3\\ 3 \end{bmatrix}
-\frac{6}{2}\begin{bmatrix} 1\\ -1\\ 0 \end{bmatrix}
-\frac{-6}{6}\begin{bmatrix} 1\\ 1\\ -2 \end{bmatrix}
=\begin{bmatrix} 1\\1\\1 \end{bmatrix}

Are we sure they are orthogonal?

\hat{\mathbf{a}}_2 = \mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
\hat{\mathbf{a}}_3 = \mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}_2}

Multiply by \(\hat{\mathbf{a}}_1 ^\top\) on both sides

\hat{\mathbf{a}}_1^\top\hat{\mathbf{a}}_2 = \hat{\mathbf{a}}_1^\top\mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}}_1^\top\hat{\mathbf{a}_1}
=0

Multiply by \(\hat{\mathbf{a}}_1 ^\top\) on both sides

\hat{\mathbf{a}}_1^\top\hat{\mathbf{a}}_3 = \hat{\mathbf{a}}_1^\top\mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}}_1^\top\hat{\mathbf{a}_1}
- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}}_1^\top\hat{\mathbf{a}_2}
0
=0

Multiply by \(\hat{\mathbf{a}}_2 ^\top\) on both sides

\therefore \hat{\mathbf{a}}_1 \perp \hat{\mathbf{a}}_2
\therefore \hat{\mathbf{a}}_1 \perp \hat{\mathbf{a}}_3
\therefore \hat{\mathbf{a}}_2 \perp \hat{\mathbf{a}}_3
\hat{\mathbf{a}}_2^\top\hat{\mathbf{a}}_3 = \hat{\mathbf{a}}_2^\top\mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}}_2^\top\hat{\mathbf{a}_1}
- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}}_2^\top\hat{\mathbf{a}_2}
0
=0

QR factorisation

\mathbf{a}_1
\mathbf{q}_2
\mathbf{q_1}
\mathbf{a_1} = z_{11}\mathbf{q_1} + z_{12}\mathbf{q_2} + z_{13}\mathbf{q_3}

Recap: The co-ordinate of the projection of \(\mathbf{a_1} \) along each orthonormal basis vector is simply the dot product of that basis vector with \(\mathbf{a_1} \)

\mathbf{a}_2
z_{11} = \mathbf{q_1}^\top\mathbf{a_1}
\mathbf{a_2} = z_{21}\mathbf{q_1} + z_{22}\mathbf{q_2} + z_{23}\mathbf{q_3}
z_{12} = \mathbf{q_2}^\top\mathbf{a_1} = 0
z_{21} = \mathbf{q_1}^\top\mathbf{a_2}
z_{22} = \mathbf{q_2}^\top\mathbf{a_2}
\mathbf{a}_3
\mathbf{q_3}
\mathbf{a_3} = z_{31}\mathbf{q_1} + z_{32}\mathbf{q_2} + z_{33}\mathbf{q_3}
\mathbf{\hat{x}} = (A^\top A)^{-1}A^\top\mathbf{b}
\mathbf{\hat{x}} = Q^\top\mathbf{b}
z_{13} = \mathbf{q_3}^\top\mathbf{a_1} = 0
z_{23} = \mathbf{q_3}^\top\mathbf{a_2} = 0
\hat{\mathbf{a}}_2 = \mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
z_{31} = \mathbf{q_1}^\top\mathbf{a_3}
z_{32} = \mathbf{q_2}^\top\mathbf{a_3}
z_{33} = \mathbf{q_3}^\top\mathbf{a_3}
\hat{\mathbf{a}}_3 = \mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}_2}

QR factorisation

\mathbf{a_1} = z_{11}\mathbf{q_1} + z_{12}\mathbf{q_2} + z_{13}\mathbf{q_3}
z_{11} = \mathbf{q_1}^\top\mathbf{a_1}
\mathbf{a_2} = z_{21}\mathbf{q_1} + z_{22}\mathbf{q_2} + z_{23}\mathbf{q_3}
z_{12} = \mathbf{q_2}^\top\mathbf{a_1} = 0
z_{21} = \mathbf{q_1}^\top\mathbf{a_2}
z_{22} = \mathbf{q_2}^\top\mathbf{a_2}
\mathbf{a_3} = z_{31}\mathbf{q_1} + z_{32}\mathbf{q_2} + z_{33}\mathbf{q_3}
z_{13} = \mathbf{q_3}^\top\mathbf{a_1} = 0
z_{23} = \mathbf{q_3}^\top\mathbf{a_2} = 0
\hat{\mathbf{a}}_2 = \mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
z_{31} = \mathbf{q_1}^\top\mathbf{a_3}
z_{32} = \mathbf{q_2}^\top\mathbf{a_3}
z_{33} = \mathbf{q_3}^\top\mathbf{a_3}
\hat{\mathbf{a}}_3 = \mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}_2}
\begin{bmatrix} \mathbf{q_1}^\top\mathbf{a_1}&\mathbf{q_1}^\top\mathbf{a_2}&\mathbf{q_1}^\top\mathbf{a_3}\\ 0&\mathbf{q_2}^\top\mathbf{a_2}&\mathbf{q_2}^\top\mathbf{a_3}\\ 0&0&\mathbf{q_3}^\top\mathbf{a_3} \end{bmatrix}
\begin{bmatrix} \uparrow&\uparrow&\uparrow\\ \mathbf{q_1}&\mathbf{q_2}&\mathbf{q_3}\\ \downarrow&\downarrow&\downarrow\\ \end{bmatrix}
\begin{bmatrix} \uparrow&\uparrow&\uparrow\\ \mathbf{a_1}&\mathbf{a_2}&\mathbf{a_3}\\ \downarrow&\downarrow&\downarrow\\ \end{bmatrix} =
A = QR

Learning Objectives

(achieved)

What are orthonormal vectors?

What is an orthonormal basis?

How do you create an orthonormal basis (Gram-Schmidt process)?

What is QR factorisation?