Day 22:

Diagonalizable matrices

Theorem. If \(A\) is an \(n\times n\) matrix and \(\{v_{1},\ldots,v_{n}\}\) is a basis of eigenvectors of \(A\) with corresponding eigenvalues \(\{\lambda_{1},\ldots,\lambda_{n}\}\), then

\[A = X\Lambda X^{-1}\]

where

\[X = \begin{bmatrix} | & | & | & & |\\ v_{1} & v_{2} & v_{3} & \cdots & v_{n}\\ | & | & | & & |\end{bmatrix}\]

and

\[\Lambda = \text{diag}(\lambda_{1},\lambda_{2},\ldots,\lambda_{n}) = \begin{bmatrix} \lambda_{1} & 0 & \cdots & 0\\ 0 & \lambda_{2} & & \vdots\\ \vdots & & \ddots & 0\\ 0 & \cdots & 0 & \lambda_{n}\end{bmatrix}\]

Diagonalizable matrices

Proof. We will show that \(Ax=X\Lambda X^{-1} x\) for an arbitrary \(x\in\R^{n}\). Since \(\{v_{1},\ldots, v_{n}\}\) is a basis, there are scalars \(\alpha_{1},\ldots,\alpha_{n}\) such that

\[x = \alpha_{1}v_{1} + \alpha_{2}v_{2} + \cdots + \alpha_{n}v_{n}\]

Multiplying by \(A\) on the left we have

\[Ax = \alpha_{1}Av_{1} + \alpha_{2}Av_{2} + \cdots + \alpha_{n}Av_{n}\]

\[ = \alpha_{1}\lambda_{1}v_{1} + \alpha_{2}\lambda_{2}v_{2} + \cdots + \alpha_{n}\lambda_{n}v_{n}\]

By the definition of \(X\) we see that \(Xe_{i} = v_{i}\), and hence \(X^{-1}v_{i} = e_{i}\) for \(i=1,\ldots,n\). Hence,

\[X\Lambda X^{-1}x = X\Lambda X^{-1}\big(\alpha_{1}v_{1} + \alpha_{2}v_{2} + \cdots + \alpha_{n}v_{n}\big)\]

\[ = X\Lambda \big(\alpha_{1}X^{-1}v_{1} + \alpha_{2}X^{-1}v_{2} + \cdots + \alpha_{n}X^{-1}v_{n}\big)\]

\[ = X\Lambda \big(\alpha_{1}e_{1} + \alpha_{2}e_{2} + \cdots + \alpha_{n}e_{n}\big)\]

\[ = X\big(\alpha_{1}\lambda_{1} e_{1} + \alpha_{2}\lambda_{2} e_{2} + \cdots + \alpha_{n}\lambda_{n}e_{n}\big)\]

\[ = \alpha_{1}\lambda_{1} v_{1} + \alpha_{2}\lambda_{2} v_{2} + \cdots + \alpha_{n}\lambda_{n}v_{n}.\]

\(\Box\)

Diagonalizable matrices

Example. Consider the matrix

\[A=\left[\begin{array}{rrr} 2 & -1 & 2\\ 0 & 3 & -2\\ 0 & 0 & -1\end{array}\right]\]

We want to find \(\lambda\in\R\) such that

\[A-\lambda I=\left[\begin{array}{rrr} 2-\lambda & -1 & 2\\ 0 & 3-\lambda & -2\\ 0 & 0 & -1-\lambda\end{array}\right]\]

is not invertible. We know that an uppertriangular matrix is invertible if and only if it has no zeros on the diagonal. Hence, the eigenvalues of \(A\) are \(2\), \(3\), and \(-1\).

Hence, we need to find each of the subspace \(N(A-\lambda I)\) for \(\lambda=2,3,\) and \(-1\).

Diagonalizable matrices

\[A-(-1) I=\left[\begin{array}{rrr} 3 & -1 & 2\\ 0 & 4 & -2\\ 0 & 0 & 0\end{array}\right]\]

Row reducing we get

\[\text{rref}(A-(-1)I) = \left[\begin{array}{rrr} 1 &0 & \frac{1}{2}\\ 0 & 1 & -\frac{1}{2}\\ 0 & 0 & 0\end{array}\right]\]

Hence, we see that

\[N(A-(-1)I) = \text{span}\left\{\begin{bmatrix} -\frac{1}{2}\\ \frac{1}{2}\\ 1\end{bmatrix}\right\}\]

Similarly

\[N(A-3I) = \text{span}\left\{\begin{bmatrix} -1\\ 1\\ 0\end{bmatrix}\right\}\quad\text{and }\quad N(A-2I) = \text{span}\left\{\begin{bmatrix} 1\\ 0\\ 0\end{bmatrix}\right\}\]

Diagonalizable matrices

Selecting a nonzero vector from each eigenspace and forming the matrix with those columns:

\[X=\begin{bmatrix} 1 & -1 & -\frac{1}{2}\\ 0 & 1 & \frac{1}{2}\\ 0 & 0 & 1\end{bmatrix}\]

Since the columns of \(X\) form a basis for \(\R^{3}\), we see that \(X\) is invertible. Indeed,

\[X^{-1} = \begin{bmatrix} 1 & 1 & 0\\ 0 & 1 & -\frac{1}{2}\\ 0 & 0 & 1\end{bmatrix}.\]

Finally,

\[\begin{bmatrix} 1 & -1 & -\frac{1}{2}\\ 0 & 1 & \frac{1}{2}\\ 0 & 0 & 1\end{bmatrix}\begin{bmatrix} 2 & 0 & 0\\ 0 & 3 & 0\\ 0 & 0 & -1\end{bmatrix}\begin{bmatrix} 1 & 1 & 0\\ 0 & 1 & -\frac{1}{2}\\ 0 & 0 & 1\end{bmatrix} = \left[\begin{array}{rrr} 2 & -1 & 2\\ 0 & 3 & -2\\ 0 & 0 & -1\end{array}\right]\]

Hence, \(A\) is diagonalizable!

\(X\)

\(\Lambda\)

\(X^{-1}\)

\(A\)

Diagonalizable matrices

Another Example. Consider the matrix

\[A = \left[\begin{array}{rrr} -1 &\ \ 0 & 0\\ -8 & 3 & 8\\ 0 & 0 & -1\end{array}\right].\]

The eigenvalues of \(A\) are \(3\) and \(-1\). Diagonalize \(A\), that is, find an invertible matrix \(X\) and a diagonal matrix \(\Lambda\) such that \(A=XAX^{-1}\).

\[\text{rref}(A-3I) = \begin{bmatrix} 1 & 0 & 0\\ 0 & 0 & 1\\ 0 & 0 & 0\end{bmatrix}\]

Therefore

\[\left\{\begin{bmatrix} 0\\ 1\\ 0\end{bmatrix}\right\}\]

is a basis for the eigenspace of \(A\) associated with the eigevalue \(3\).

Diagonalizable matrices

Another Example. Next we note that

\[\text{rref}(A-(-1)I) = \begin{bmatrix} 1 & -\frac{1}{2} & -1\\ 0 & 0 & 0\\ 0 & 0 & 0\end{bmatrix}.\]

From this we see that

\[\left\{\begin{bmatrix} 1/2\\ 1\\ 0\end{bmatrix},\begin{bmatrix} 1\\ 0\\ 1\end{bmatrix}\right\}\]

is a basis for the eigenspace of \(A\) associated with the eigenvalue \(-1\).

It is simple to verify that

\[\left\{\begin{bmatrix} 0\\ 1\\ 0\end{bmatrix},\begin{bmatrix} 1/2\\ 1\\ 0\end{bmatrix},\begin{bmatrix} 1\\ 0\\ 1\end{bmatrix}\right\}\]

is a basis of eigenvectors of \(A\).

Diagonalizable matrices

Another Example. If we set

\[X = \left[\begin{array}{rrr} 0 & \frac{1}{2} & 1\\ 1 & 1 & 0\\ 0 & 0 & 1\end{array}\right]\]

\[\Lambda = \left[\begin{array}{rrr} 3 & 0 & 0\\ 0 & -1 & 0\\ 0 & 0 & -1\end{array}\right]\]

and

then

\[X^{-1} = \left[\begin{array}{rrr} -2 &\ \ 1 & 2\\ 2 & 0 & -2\\ 0 & 0 & 1\end{array}\right]\]

and

\[X\Lambda X^{-1} = \left[\begin{array}{rrr} 0 & \frac{1}{2} & 1\\ 1 & 1 & 0\\ 0 & 0 & 1\end{array}\right]\left[\begin{array}{rrr} 3 & 0 & 0\\ 0 & -1 & 0\\ 0 & 0 & -1\end{array}\right]\left[\begin{array}{rrr} -2 &\ \ 1 & 2\\ 2 & 0 & -2\\ 0 & 0 & 1\end{array}\right]\]

\[= \left[\begin{array}{rrr} -1 &\ \ 0 & 0\\ -8 & 3 & 8\\ 0 & 0 & -1\end{array}\right] = A\]

Finding Eigenvalues

Theorem. Let \(A\) be an \(n\times n\) matrix, and let \(v\) be a nonzero vector. Define the sequence \((v_{n})\) by \(v_{1} = \frac{1}{\|v\|}v\), and

\[v_{k+1} = \frac{Av_{k}}{\|Av_{k}\|}\quad\text{for all }k\in\N.\]

If \(A\) is diagonalizable with eigenvalues \(\lambda_{1},\ldots,\lambda_{n}\), and \(|\lambda_{1}|>|\lambda_{2}|\geq|\lambda_{3}|\geq\cdots\geq |\lambda_{n}|\), then the sequence

\[v_{1}^{\top}Av_{1}, v_{2}^{\top}Av_{2}, v_{3}^{\top}Av_{3},\ldots,v_{k}^{\top}Av_{k},\ldots\]

converges to an eigenvalue of \(A\) for almost all initial vectors \(v\).

Example. Let

\[A = \left[\begin{array}{rrrr} -1 & -3 & 7 & 5\\ -3 & -1 & 5 & 7\\ 7 & 5 & -1 & -3\\ 5 & 7 & -3 & -1\end{array}\right].\]

Set \(v_{1} = \begin{bmatrix} 1 & 0 & 0 & 0\end{bmatrix}^{\top}\). Then,

\[v_{1} = \begin{bmatrix}1\\ 0\\ 0\\ 0\end{bmatrix},\ v_{2} = \begin{bmatrix}-0.1091\\ -0.3273\\ 0.7638\\ 0.5455\end{bmatrix}, v_{3} = \begin{bmatrix} 0.6355\\ 0.5750\\ -0.3329\\ -0.3934 \end{bmatrix}, v_{4} = \begin{bmatrix} -0.4263 \\ -0.4418\\ 0.5658\\ 0.5503 \end{bmatrix}, \]

\[v_{5} = \begin{bmatrix} 0.5322 \\ 0.5283 \\ -0.4659\\ -0.4698 \end{bmatrix},\ldots,v_{12} = \begin{bmatrix} -0.4998\\ -0.4998 \\ 0.5002\\ 0.5002 \end{bmatrix}, v_{13} = \begin{bmatrix} 0.5001\\ 0.5001\\ -0.4999\\ -0.4999 \end{bmatrix},\ldots\]

Note that this sequence of vectors is not converging.

Example Continued. However,

\[v_{1}^{\top}Av_{1} = -1,\quad v_{2}^{\top}Av_{2} = -10.4762,\quad v_{3}^{\top}Av_{3} = -14.5201,\]

\[v_{4}^{\top}Av_{4}= -15.6261,\quad v_{5}^{\top}Av_{5} = -15.9063, \quad \ldots\]

\[ v_{12}^{\top}Av_{12} = -15.999994277954329,\quad\ldots\quad,v_{40}^{\top}Av_{40} = -16\]

To machine precision

Hence, it looks like \(-16\) is an eigenvalue of \[A = \begin{bmatrix} -1 & -3 & 7 & 5\\ -3 & -1 & 5 & 7\\ 7 & 5 & -1 & -3\\ 5 & 7 & -3 & -1 \end{bmatrix}.\]

Why Does this Power Method Work?

If \(A\) is diagonaliable, then there is an invertible matrix \(X\) and a diagonal matrix \(\Lambda\) such that \(A = X\Lambda X^{-1}.\)

\[A^2 = (X\Lambda X^{-1})(X\Lambda X^{-1}) = X\Lambda\Lambda X^{-1} = X\Lambda^{2} X^{-1}\]

\[A^3 = AA^2 = (X\Lambda X^{-1})(X\Lambda^2 X^{-1}) = X\Lambda\Lambda^2 X^{-1} = X\Lambda^{3} X^{-1}\]

\[A^4 = AA^3 = (X\Lambda X^{-1})(X\Lambda^3 X^{-1}) = X\Lambda\Lambda^3 X^{-1} = X\Lambda^{4} X^{-1}\]

\[\vdots\]

In general,

\[A^{k} = X\Lambda^{k} X^{-1}.\]

Why Does this Power Method Work?

Since \(\Lambda\) is a diagonal matrix, we have

\[\Lambda = \begin{bmatrix} \lambda_{1} & 0 & \cdots & 0\\ 0 & \lambda_{2} & & \vdots\\ \vdots & & \ddots & \vdots\\ 0 & \cdots & \cdots & \lambda_{n}\end{bmatrix}\quad\text{and}\quad \Lambda^{k} = \begin{bmatrix} \lambda_{1}^{k} & 0 & \cdots & 0\\ 0 & \lambda_{2}^{k} & & \vdots\\ \vdots & & \ddots & \vdots\\ 0 & \cdots & \cdots & \lambda_{n}^{k}\end{bmatrix}\]

If we assume \(|\lambda_{1}|> |\lambda_{2}|\geq\cdots\geq |\lambda_{n}|\), then we have

\[\frac{A^{k}}{\lambda_{1}^{k}} = X\begin{bmatrix} 1 & 0 & \cdots & 0\\ 0 & (\lambda_{2}/\lambda_{1})^{k} & & \vdots\\ \vdots & & \ddots & \vdots\\ 0 & \cdots & \cdots & (\lambda_{n}/\lambda_{1})^{k}\end{bmatrix} X^{-1} \approx X\begin{bmatrix} 1 & 0 & \cdots & 0\\ 0 & 0 & & \vdots\\ \vdots & & \ddots & \vdots\\ 0 & \cdots & \cdots & 0\end{bmatrix} X^{-1}\]

When \(k\) is very large

Hence, if \(X^{-1}v\) has a nonzero first coordinate, then \(A^{k}v/\lambda_{1}^{k}\) is a approximately a scalar multiple of the first column of \(X.\) That is, \(A^{k}v/\lambda_{1}^{k}\approx \alpha x_{1}\) and hence \(A^{k}v\approx \beta x_{1}.\)

Why Does this Power Method Work?

Note that

\[v_{k+1} = \frac{A^{k}v}{\|A^{k}v\|}\approx \pm \frac{x_{1}}{\|x_{1}\|}\]

Hence,

\[v_{k+1}^{\top}Av_{k+1} \approx \left( \pm \frac{x_{1}}{\|x_{1}\|}\right)^{\top}A\left( \pm \frac{x_{1}}{\|x_{1}\|}\right) = \frac{1}{\|x_{1}\|^{2}} x_{1}^{\top}\lambda_{1}x_{1} = \lambda_{1}\]

What can go wrong?

If \(\lambda\) and \(-\lambda\) are both eigenvalues:

Example. Consider the matrix \(A = \begin{bmatrix} 1 & 0\\ 0 & -1\end{bmatrix}.\)

If we start with a vector \(v_{1}=\begin{bmatrix} \frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}}\end{bmatrix}\), then

\[v_{2} = \frac{Av_{1}}{\|Av_{1}\|^{2}} = \begin{bmatrix} \frac{1}{\sqrt{2}}\\ -\frac{1}{\sqrt{2}}\end{bmatrix},\ v_{3} = \begin{bmatrix} \frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}}\end{bmatrix},\ v_{4} = \begin{bmatrix} \frac{1}{\sqrt{2}}\\ -\frac{1}{\sqrt{2}}\end{bmatrix},\ v_{5} = \begin{bmatrix} \frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}}\end{bmatrix},\ldots\]

Worse yet:

\[v_{2}^{\top}Av_{2} = 0,\ v_{3}^{\top}Av_{3} = 0,\ v_{4}^{\top}Av_{4} = 0,\ldots\]

How can we fix it?

Example. Add a bit \(A+I = \begin{bmatrix} 2 & 0\\ 0 & 0\end{bmatrix}.\)

If we start with a vector \(v_{1}=\begin{bmatrix} \frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}}\end{bmatrix}\), then

\[v_{2} = \frac{Av_{1}}{\|Av_{1}\|^{2}} = \begin{bmatrix} 1\\ 0\end{bmatrix},\ v_{3} = \begin{bmatrix} 1\\ 0\end{bmatrix},\ v_{4} = \begin{bmatrix} 1\\ 0\end{bmatrix},\ v_{5} = \begin{bmatrix} 1\\ 0\end{bmatrix},\ldots\]

And

\[v_{2}^{\top}Av_{2} = 2,\ v_{3}^{\top}Av_{3} = 2,\ v_{4}^{\top}Av_{4} = 2,\ldots\]

From this we can conclude that \(2-1=1\) is an eigenvalue of \(A\).

Be careful, what if the eigenvalues of \(A\) were \(-2\) and \(0\). We wouldn't know that in advance, and adding \(I\) would create the problem we see here.

How can we fix it?

Proposition. If \(A\) is a matrix with eigenvalues \(\lambda_{1},\ldots,\lambda_{n}\) and associated eigenvectors \(v_{1},\ldots,v_{n}\), then \(A+cI\) has eigenvalues \(\lambda_{1}+c,\ldots,\lambda_{n}+c\) with the same associated eigenvectors.

Proof.

\[(A+cI)v_{j} = Av_{j} + cIv_{j} = \lambda_{j}v_{j}+cv_{j} = (\lambda_{j}+c)v_{j}.\quad\Box\]

To avoid the problem that \(\lambda\) and \(-\lambda\) are both eigenvalues of \(A\) we first take a random number \(c\), then use the power method to find the eigenvalues of \(A+cI\). Subtracting \(c\) from each of the eigenvalues of \(A+cI\) we obtain the eigenvalues of \(A.\)

How do we code it?

Input: Matrix \(A\)

Let \(v\) be a random vector: \(\texttt{v=randn(n,1);}\)
Normalize \(v\): \(\texttt{v=v/norm(v);}\)
Compute an approximate eigenvalue: \(\texttt{la = v'*A*v;}\)
Check whether \(\texttt{la}\) is an eigenvalue and \(v\) is the eigenvector: \(\texttt{if norm(la*v-A*v)<thresh, done; end;}\)
If not, update \(v\): \(\texttt{v = Av/norm(Av);}\) and go back to step \(3\).

Once this algorithm terminates with \(\texttt{done}\), then \(\texttt{la}\) is (approximately) an eigenvalue and \(\texttt{v}\) is a normalized eigenvector.

Notes: If the algorithm takes too long, you should start over with a new random vector.

If \(A\) has two opposite eigenvalues, then the algorithm may not work. So, we start by adding \(\texttt{c*eye(n)}\) to our matrix \(A\) where \(c\) is a random number, then subtract \(c\) from the eigenvalue we get at the end.