# Orthonormal vectors

q_i^\top q_j = \begin{cases} 0~~if~~i\neq j\\ 1~~if~~i = j\\ \end{cases}

### If $$Q$$ is a matrix whose columns are orthonormal then

Q = \begin{bmatrix} \uparrow&\uparrow&\uparrow&\uparrow&\\ q_1&q_2&\dots&q_n\\ \downarrow&\downarrow&\downarrow&\downarrow&\\ \end{bmatrix}
Q^\top Q = \begin{bmatrix} \leftarrow&q_1^\top&\rightarrow\\ \leftarrow&q_2^\top&\rightarrow\\ \leftarrow&\dots&\rightarrow\\ \leftarrow&q_n^\top&\rightarrow\\ \end{bmatrix} \begin{bmatrix} \uparrow&\uparrow&\uparrow&\uparrow&\\ q_1&q_2&\dots&q_n\\ \downarrow&\downarrow&\downarrow&\downarrow&\\ \end{bmatrix}
=I

# Orthogonal matrix

### If $$Q$$ is a square matrix whose columns are orthonormal then it is called an orthogonal matrix

Q^\top Q = QQ^\top = I

### Rotation

\begin{bmatrix} 0&1&0\\ 1&0&0\\ 0&0&1\\ \end{bmatrix}
\begin{bmatrix} cos \theta&-sin \theta\\ sin \theta& cos \theta \end{bmatrix}
\begin{bmatrix} 1 & 1\\ 1 & -1 \end{bmatrix}
\begin{bmatrix} 0&1&0\\ 1&0&0\\ 0&0&1\\ 0&0&0\\ \end{bmatrix}

### Rectangular Matrix

\begin{bmatrix} 0&1&0&0\\ 1&0&0&0\\ 0&0&1&0\\ \end{bmatrix}
QQ^\top \neq I
= \begin{bmatrix} 1&0&0&0\\ 0&1&0&0\\ 0&0&1&0\\ 0&0&0&0\\ \end{bmatrix}

### (Orthonormal columns)

Q^\top Q = I
Q
Q^\top
\frac{1}{\sqrt{2}}
\begin{bmatrix} 1 & 2 & -2\\ 2 & 1 & 2\\ 2 & -2 & -1 \end{bmatrix}
\frac{1}{3}

# Why do we care? (about orthogonal matrices)

### If $$A = Q$$ (a matrix with orthonormal columns)

Q^\top Q\mathbf{\hat{x}} = Q^\top\mathbf{b}
\mathbf{\hat{x}} = Q^\top\mathbf{b}
\begin{bmatrix} \hat{x_1}\\ \hat{x_2}\\ \dots\\ \hat{x_n}\\ \end{bmatrix} = \begin{bmatrix} \leftarrow&\mathbf{q}_1^\top&\rightarrow\\ \leftarrow&\mathbf{q}_2^\top&\rightarrow\\ \dots\\ \leftarrow&\mathbf{q}_n^\top&\rightarrow\\ \end{bmatrix}
\mathbf{b}
\mathbf{p} = \hat{x_1}\mathbf{q}_1 + \hat{x_2}\mathbf{q}_2 + \dots + \hat{x_n}\mathbf{q}_n
\hat{x_i} = \mathbf{q}_i^\top\mathbf{b}
column space of A
\mathbf{q}_1\in \mathbb{R}^6
\mathbf{q}_2\in \mathbb{R}^6
\mathbf{b}\in \mathbb{R}^6
\mathbf{p}

### Recap

A^\top A\mathbf{\hat{x}} = A^\top\mathbf{b}
\mathbf{\hat{x}} = (A^\top A)^{-1}A^\top\mathbf{b}

# Why do we care? (about orthogonal matrices)

### Recap

A^\top A\mathbf{\hat{x}} = A^\top\mathbf{b}
\mathbf{\hat{x}} = (A^\top A)^{-1}A^\top\mathbf{b}

### If $$A = Q$$ (a matrix with orthonormal columns)

column space of A
\mathbf{q}_1\in \mathbb{R}^6
\mathbf{q}_2\in \mathbb{R}^6
\mathbf{b}\in \mathbb{R}^6
\mathbf{p}
\mathbf{p} = \hat{x_1}\mathbf{q}_1 + \hat{x_2}\mathbf{q}_2 + \dots + \hat{x_n}\mathbf{q}_n
\hat{x_i} = \mathbf{q}_i^\top\mathbf{b}

### The co-ordinate of the projection of $$\mathbf{b}$$ along each basis vector is simply the dot product of that basis vector with $$\mathbf{b}$$

(as opposed to the complicated formula you see in this box)

# Gram-Schmidt Process

### Step 2 is easy (we will not focus too much on it)

\mathbf{q}_i = \frac{\hat{\mathbf{a}}_i}{||\hat{\mathbf{a}}_i||_2}
\mathbf{a}_1
\mathbf{a}_2
\hat{\mathbf{a}}_1
\hat{\mathbf{a}}_2
\mathbf{q}_1
\mathbf{q}_2

# Gram-Schmidt Process

\mathbf{a}_1
\mathbf{a}_2
\mathbf{p}
\mathbf{e}

### $$\mathbf{e}$$ is the component of $$\mathbf{a}_2$$ orthogonal to $$\mathbf{a}_1$$

this is what we want to get rid of
this is what we want to retain

### We will retain $$\mathbf{a}_1$$ as the first basis vector

\hat{\mathbf{a}}_1= \mathbf{a}_1
\hat{\mathbf{a}}_2= \mathbf{e} = \mathbf{a}_2 - {\mathbf{p}}
\therefore \hat{\mathbf{a}}_2 = \mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}

# Gram-Schmidt Process

\mathbf{a}_1
\mathbf{a}_2
\mathbf{p}
\mathbf{e}

### We want to get rid of the component of $$\mathbf{a}_3$$ along $$\mathbf{a}_1$$

(just as we did for a2)

### (first basis vector)

\hat{\mathbf{a}}_1= \mathbf{a}_1
\hat{\mathbf{a}}_2 = \mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}

### (second basis vector)

\hat{\mathbf{a}}_3 = \mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}

### We also want to get rid of the component of $$\mathbf{a}_3$$ along $$\mathbf{a}_2$$

(because we want a3 to be orthogonal to a2 also)
- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}_2}

# Gram-Schmidt Process

\mathbf{a}_1=\begin{bmatrix} 1\\ -1\\ 0 \end{bmatrix}
\hat{\mathbf{a}}_2 = \mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
\hat{\mathbf{a}}_3 = \mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}_2}

### (Example)

\mathbf{a}_2=\begin{bmatrix} 2\\ 0\\ -2 \end{bmatrix}
\mathbf{a}_3=\begin{bmatrix} 3\\ -3\\ 3 \end{bmatrix}
\hat{\mathbf{a}}_1 = \mathbf{a}_1=\begin{bmatrix} 1\\ -1\\ 0 \end{bmatrix}
=\begin{bmatrix} 2\\ 0\\ -2 \end{bmatrix}
-\frac{2}{2}\begin{bmatrix} 1\\ -1\\ 0 \end{bmatrix}
=\begin{bmatrix} 1\\ 1\\ -2 \end{bmatrix}
=\begin{bmatrix} 3\\ -3\\ 3 \end{bmatrix}
-\frac{6}{2}\begin{bmatrix} 1\\ -1\\ 0 \end{bmatrix}
-\frac{-6}{6}\begin{bmatrix} 1\\ 1\\ -2 \end{bmatrix}
=\begin{bmatrix} 1\\1\\1 \end{bmatrix}

# Are we sure they are orthogonal?

\hat{\mathbf{a}}_2 = \mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
\hat{\mathbf{a}}_3 = \mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}_2}

### Multiply by $$\hat{\mathbf{a}}_1 ^\top$$ on both sides

\hat{\mathbf{a}}_1^\top\hat{\mathbf{a}}_2 = \hat{\mathbf{a}}_1^\top\mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}}_1^\top\hat{\mathbf{a}_1}
=0

### Multiply by $$\hat{\mathbf{a}}_1 ^\top$$ on both sides

\hat{\mathbf{a}}_1^\top\hat{\mathbf{a}}_3 = \hat{\mathbf{a}}_1^\top\mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}}_1^\top\hat{\mathbf{a}_1}
- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}}_1^\top\hat{\mathbf{a}_2}
0
=0

### Multiply by $$\hat{\mathbf{a}}_2 ^\top$$ on both sides

\therefore \hat{\mathbf{a}}_1 \perp \hat{\mathbf{a}}_2
\therefore \hat{\mathbf{a}}_1 \perp \hat{\mathbf{a}}_3
\therefore \hat{\mathbf{a}}_2 \perp \hat{\mathbf{a}}_3
\hat{\mathbf{a}}_2^\top\hat{\mathbf{a}}_3 = \hat{\mathbf{a}}_2^\top\mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}}_2^\top\hat{\mathbf{a}_1}
- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}}_2^\top\hat{\mathbf{a}_2}
0
=0

# QR factorisation

\mathbf{a}_1
\mathbf{q}_2
\mathbf{q_1}
\mathbf{a_1} = z_{11}\mathbf{q_1} + z_{12}\mathbf{q_2} + z_{13}\mathbf{q_3}

### Recap: The co-ordinate of the projection of $$\mathbf{a_1}$$ along each orthonormal basis vector is simply the dot product of that basis vector with $$\mathbf{a_1}$$

\mathbf{a}_2
z_{11} = \mathbf{q_1}^\top\mathbf{a_1}
\mathbf{a_2} = z_{21}\mathbf{q_1} + z_{22}\mathbf{q_2} + z_{23}\mathbf{q_3}
z_{12} = \mathbf{q_2}^\top\mathbf{a_1} = 0
z_{21} = \mathbf{q_1}^\top\mathbf{a_2}
z_{22} = \mathbf{q_2}^\top\mathbf{a_2}
\mathbf{a}_3
\mathbf{q_3}
\mathbf{a_3} = z_{31}\mathbf{q_1} + z_{32}\mathbf{q_2} + z_{33}\mathbf{q_3}
\mathbf{\hat{x}} = (A^\top A)^{-1}A^\top\mathbf{b}
\mathbf{\hat{x}} = Q^\top\mathbf{b}
z_{13} = \mathbf{q_3}^\top\mathbf{a_1} = 0
z_{23} = \mathbf{q_3}^\top\mathbf{a_2} = 0
\hat{\mathbf{a}}_2 = \mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
z_{31} = \mathbf{q_1}^\top\mathbf{a_3}
z_{32} = \mathbf{q_2}^\top\mathbf{a_3}
z_{33} = \mathbf{q_3}^\top\mathbf{a_3}
\hat{\mathbf{a}}_3 = \mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}_2}

# QR factorisation

\mathbf{a_1} = z_{11}\mathbf{q_1} + z_{12}\mathbf{q_2} + z_{13}\mathbf{q_3}
z_{11} = \mathbf{q_1}^\top\mathbf{a_1}
\mathbf{a_2} = z_{21}\mathbf{q_1} + z_{22}\mathbf{q_2} + z_{23}\mathbf{q_3}
z_{12} = \mathbf{q_2}^\top\mathbf{a_1} = 0
z_{21} = \mathbf{q_1}^\top\mathbf{a_2}
z_{22} = \mathbf{q_2}^\top\mathbf{a_2}
\mathbf{a_3} = z_{31}\mathbf{q_1} + z_{32}\mathbf{q_2} + z_{33}\mathbf{q_3}
z_{13} = \mathbf{q_3}^\top\mathbf{a_1} = 0
z_{23} = \mathbf{q_3}^\top\mathbf{a_2} = 0
\hat{\mathbf{a}}_2 = \mathbf{a}_2 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_2}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}
z_{31} = \mathbf{q_1}^\top\mathbf{a_3}
z_{32} = \mathbf{q_2}^\top\mathbf{a_3}
z_{33} = \mathbf{q_3}^\top\mathbf{a_3}
\hat{\mathbf{a}}_3 = \mathbf{a}_3 - \frac{\hat{\mathbf{a}_1}^\top\mathbf{a}_3}{\hat{\mathbf{a}_1}^\top\hat{\mathbf{a}_1}}\hat{\mathbf{a}_1}- \frac{\hat{\mathbf{a}_2}^\top\mathbf{a}_3}{\hat{\mathbf{a}_2}^\top\hat{\mathbf{a}_2}}\hat{\mathbf{a}_2}
\begin{bmatrix} \mathbf{q_1}^\top\mathbf{a_1}&\mathbf{q_1}^\top\mathbf{a_2}&\mathbf{q_1}^\top\mathbf{a_3}\\ 0&\mathbf{q_2}^\top\mathbf{a_2}&\mathbf{q_2}^\top\mathbf{a_3}\\ 0&0&\mathbf{q_3}^\top\mathbf{a_3} \end{bmatrix}
\begin{bmatrix} \uparrow&\uparrow&\uparrow\\ \mathbf{q_1}&\mathbf{q_2}&\mathbf{q_3}\\ \downarrow&\downarrow&\downarrow\\ \end{bmatrix}
\begin{bmatrix} \uparrow&\uparrow&\uparrow\\ \mathbf{a_1}&\mathbf{a_2}&\mathbf{a_3}\\ \downarrow&\downarrow&\downarrow\\ \end{bmatrix} =
A = QR