More on diagonalizable matrices
The Power Method
Theorem. If \(A\) is an \(n\times n\) matrix and \(\{v_{1},\ldots,v_{n}\}\) is a basis of eigenvectors of \(A\) with corresponding eigenvalues \(\{\lambda_{1},\ldots,\lambda_{n}\}\), then
\[A = X\Lambda X^{-1}\]
where
\[X = \begin{bmatrix} | & | & | & & |\\ v_{1} & v_{2} & v_{3} & \cdots & v_{n}\\ | & | & | & & |\end{bmatrix}\]
and
\[\Lambda = \text{diag}(\lambda_{1},\lambda_{2},\ldots,\lambda_{n}) = \begin{bmatrix} \lambda_{1} & 0 & \cdots & 0\\ 0 & \lambda_{2} & & \vdots\\ \vdots & & \ddots & 0\\ 0 & \cdots & 0 & \lambda_{n}\end{bmatrix}\]
Proof. We will show that \(Ax=X\Lambda X^{-1} x\) for an arbitrary \(x\in\R^{n}\). Since \(\{v_{1},\ldots, v_{n}\}\) is a basis, there are scalars \(\alpha_{1},\ldots,\alpha_{n}\) such that
\[x = \alpha_{1}v_{1} + \alpha_{2}v_{2} + \cdots + \alpha_{n}v_{n}\]
Multiplying by \(A\) on the left we have
\[Ax = \alpha_{1}Av_{1} + \alpha_{2}Av_{2} + \cdots + \alpha_{n}Av_{n}\]
\[ = \alpha_{1}\lambda_{1}v_{1} + \alpha_{2}\lambda_{2}v_{2} + \cdots + \alpha_{n}\lambda_{n}v_{n}\]
By the definition of \(X\) we see that \(Xe_{i} = v_{i}\), and hence \(X^{-1}v_{i} = e_{i}\) for \(i=1,\ldots,n\). Hence,
\[X\Lambda X^{-1}x = X\Lambda X^{-1}\big(\alpha_{1}v_{1} + \alpha_{2}v_{2} + \cdots + \alpha_{n}v_{n}\big)\]
\[ = X\Lambda \big(\alpha_{1}X^{-1}v_{1} + \alpha_{2}X^{-1}v_{2} + \cdots + \alpha_{n}X^{-1}v_{n}\big)\]
\[ = X\Lambda \big(\alpha_{1}e_{1} + \alpha_{2}e_{2} + \cdots + \alpha_{n}e_{n}\big)\]
\[ = X\big(\alpha_{1}\lambda_{1} e_{1} + \alpha_{2}\lambda_{2} e_{2} + \cdots + \alpha_{n}\lambda_{n}e_{n}\big)\]
\[ = \alpha_{1}\lambda_{1} v_{1} + \alpha_{2}\lambda_{2} v_{2} + \cdots + \alpha_{n}\lambda_{n}v_{n}.\]
\(\Box\)
Example. Consider the matrix
\[A=\left[\begin{array}{rrr} 2 & -1 & 2\\ 0 & 3 & -2\\ 0 & 0 & -1\end{array}\right]\]
We want to find \(\lambda\in\R\) such that
\[A-\lambda I=\left[\begin{array}{rrr} 2-\lambda & -1 & 2\\ 0 & 3-\lambda & -2\\ 0 & 0 & -1-\lambda\end{array}\right]\]
is not invertible. We know that an uppertriangular matrix is invertible if and only if it has no zeros on the diagonal. Hence, the eigenvalues of \(A\) are \(2\), \(3\), and \(-1\).
Hence, we need to find each of the subspace \(N(A-\lambda I)\) for \(\lambda=2,3,\) and \(-1\).
\[A-(-1) I=\left[\begin{array}{rrr} 3 & -1 & 2\\ 0 & 4 & -2\\ 0 & 0 & 0\end{array}\right]\]
Row reducing we get
\[\text{rref}(A-(-1)I) = \left[\begin{array}{rrr} 1 &0 & \frac{1}{2}\\ 0 & 1 & -\frac{1}{2}\\ 0 & 0 & 0\end{array}\right]\]
Hence, we see that
\[N(A-(-1)I) = \text{span}\left\{\begin{bmatrix} -\frac{1}{2}\\ \frac{1}{2}\\ 1\end{bmatrix}\right\}\]
Similarly
\[N(A-3I) = \text{span}\left\{\begin{bmatrix} -1\\ 1\\ 0\end{bmatrix}\right\}\quad\text{and }\quad N(A-2I) = \text{span}\left\{\begin{bmatrix} 1\\ 0\\ 0\end{bmatrix}\right\}\]
Selecting a nonzero vector from each eigenspace and forming the matrix with those columns:
\[X=\begin{bmatrix} 1 & -1 & -\frac{1}{2}\\ 0 & 1 & \frac{1}{2}\\ 0 & 0 & 1\end{bmatrix}\]
Since the columns of \(X\) form a basis for \(\R^{3}\), we see that \(X\) is invertible. Indeed,
\[X^{-1} = \begin{bmatrix} 1 & 1 & 0\\ 0 & 1 & -\frac{1}{2}\\ 0 & 0 & 1\end{bmatrix}.\]
Finally,
\[\begin{bmatrix} 1 & -1 & -\frac{1}{2}\\ 0 & 1 & \frac{1}{2}\\ 0 & 0 & 1\end{bmatrix}\begin{bmatrix} 2 & 0 & 0\\ 0 & 3 & 0\\ 0 & 0 & -1\end{bmatrix}\begin{bmatrix} 1 & 1 & 0\\ 0 & 1 & -\frac{1}{2}\\ 0 & 0 & 1\end{bmatrix} = \left[\begin{array}{rrr} 2 & -1 & 2\\ 0 & 3 & -2\\ 0 & 0 & -1\end{array}\right]\]
Hence, \(A\) is diagonalizable!
\(X\)
\(\Lambda\)
\(X^{-1}\)
\(A\)
Another Example. Consider the matrix
\[A = \left[\begin{array}{rrr} -1 &\ \ 0 & 0\\ -8 & 3 & 8\\ 0 & 0 & -1\end{array}\right].\]
The eigenvalues of \(A\) are \(3\) and \(-1\). Diagonalize \(A\), that is, find an invertible matrix \(X\) and a diagonal matrix \(\Lambda\) such that \(A=XAX^{-1}\).
\[\text{rref}(A-3I) = \begin{bmatrix} 1 & 0 & 0\\ 0 & 0 & 1\\ 0 & 0 & 0\end{bmatrix}\]
Therefore
\[\left\{\begin{bmatrix} 0\\ 1\\ 0\end{bmatrix}\right\}\]
is a basis for the eigenspace of \(A\) associated with the eigevalue \(3\).
Another Example. Next we note that
\[\text{rref}(A-(-1)I) = \begin{bmatrix} 1 & -\frac{1}{2} & -1\\ 0 & 0 & 0\\ 0 & 0 & 0\end{bmatrix}.\]
From this we see that
\[\left\{\begin{bmatrix} 1/2\\ 1\\ 0\end{bmatrix},\begin{bmatrix} 1\\ 0\\ 1\end{bmatrix}\right\}\]
is a basis for the eigenspace of \(A\) associated with the eigenvalue \(-1\).
It is simple to verify that
\[\left\{\begin{bmatrix} 0\\ 1\\ 0\end{bmatrix},\begin{bmatrix} 1/2\\ 1\\ 0\end{bmatrix},\begin{bmatrix} 1\\ 0\\ 1\end{bmatrix}\right\}\]
is a basis of eigenvectors of \(A\).
Another Example. If we set
\[X = \left[\begin{array}{rrr} 0 & \frac{1}{2} & 1\\ 1 & 1 & 0\\ 0 & 0 & 1\end{array}\right]\]
\[\Lambda = \left[\begin{array}{rrr} 3 & 0 & 0\\ 0 & -1 & 0\\ 0 & 0 & -1\end{array}\right]\]
and
then
\[X^{-1} = \left[\begin{array}{rrr} -2 &\ \ 1 & 2\\ 2 & 0 & -2\\ 0 & 0 & 1\end{array}\right]\]
and
\[X\Lambda X^{-1} = \left[\begin{array}{rrr} 0 & \frac{1}{2} & 1\\ 1 & 1 & 0\\ 0 & 0 & 1\end{array}\right]\left[\begin{array}{rrr} 3 & 0 & 0\\ 0 & -1 & 0\\ 0 & 0 & -1\end{array}\right]\left[\begin{array}{rrr} -2 &\ \ 1 & 2\\ 2 & 0 & -2\\ 0 & 0 & 1\end{array}\right]\]
\[= \left[\begin{array}{rrr} -1 &\ \ 0 & 0\\ -8 & 3 & 8\\ 0 & 0 & -1\end{array}\right] = A\]
Theorem. Let \(A\) be an \(n\times n\) matrix, and let \(v\) be a nonzero vector. Define the sequence \((v_{n})\) by \(v_{1} = \frac{1}{\|v\|}v\), and
\[v_{k+1} = \frac{Av_{k}}{\|Av_{k}\|}\quad\text{for all }k\in\N.\]
If \(A\) is diagonalizable with eigenvalues \(\lambda_{1},\ldots,\lambda_{n}\), and \(|\lambda_{1}|>|\lambda_{2}|\geq|\lambda_{3}|\geq\cdots\geq |\lambda_{n}|\), then the sequence
\[v_{1}^{\top}Av_{1}, v_{2}^{\top}Av_{2}, v_{3}^{\top}Av_{3},\ldots,v_{k}^{\top}Av_{k},\ldots\]
converges to an eigenvalue of \(A\) for almost all initial vectors \(v\).
Example. Let
\[A = \left[\begin{array}{rrrr} -1 & -3 & 7 & 5\\ -3 & -1 & 5 & 7\\ 7 & 5 & -1 & -3\\ 5 & 7 & -3 & -1\end{array}\right].\]
Set \(v_{1} = \begin{bmatrix} 1 & 0 & 0 & 0\end{bmatrix}^{\top}\). Then,
\[v_{1} = \begin{bmatrix}1\\ 0\\ 0\\ 0\end{bmatrix},\ v_{2} = \begin{bmatrix}-0.1091\\ -0.3273\\ 0.7638\\ 0.5455\end{bmatrix}, v_{3} = \begin{bmatrix} 0.6355\\ 0.5750\\ -0.3329\\ -0.3934 \end{bmatrix}, v_{4} = \begin{bmatrix} -0.4263 \\ -0.4418\\ 0.5658\\ 0.5503 \end{bmatrix}, \]
\[v_{5} = \begin{bmatrix} 0.5322 \\ 0.5283 \\ -0.4659\\ -0.4698 \end{bmatrix},\ldots,v_{12} = \begin{bmatrix} -0.4998\\ -0.4998 \\ 0.5002\\ 0.5002 \end{bmatrix}, v_{13} = \begin{bmatrix} 0.5001\\ 0.5001\\ -0.4999\\ -0.4999 \end{bmatrix},\ldots\]
Note that this sequence of vectors is not converging.
Example Continued. However,
\[v_{1}^{\top}Av_{1} = -1,\quad v_{2}^{\top}Av_{2} = -10.4762,\quad v_{3}^{\top}Av_{3} = -14.5201,\]
\[v_{4}^{\top}Av_{4}= -15.6261,\quad v_{5}^{\top}Av_{5} = -15.9063, \quad \ldots\]
\[ v_{12}^{\top}Av_{12} = -15.999994277954329,\quad\ldots\quad,v_{40}^{\top}Av_{40} = -16\]
To machine precision
Hence, it looks like \(-16\) is an eigenvalue of \[A = \begin{bmatrix} -1 & -3 & 7 & 5\\ -3 & -1 & 5 & 7\\ 7 & 5 & -1 & -3\\ 5 & 7 & -3 & -1 \end{bmatrix}.\]
If \(A\) is diagonaliable, then there is an invertible matrix \(X\) and a diagonal matrix \(\Lambda\) such that \(A = X\Lambda X^{-1}.\)
\[A^2 = (X\Lambda X^{-1})(X\Lambda X^{-1}) = X\Lambda\Lambda X^{-1} = X\Lambda^{2} X^{-1}\]
\[A^3 = AA^2 = (X\Lambda X^{-1})(X\Lambda^2 X^{-1}) = X\Lambda\Lambda^2 X^{-1} = X\Lambda^{3} X^{-1}\]
\[A^4 = AA^3 = (X\Lambda X^{-1})(X\Lambda^3 X^{-1}) = X\Lambda\Lambda^3 X^{-1} = X\Lambda^{4} X^{-1}\]
\[\vdots\]
In general,
\[A^{k} = X\Lambda^{k} X^{-1}.\]
Since \(\Lambda\) is a diagonal matrix, we have
\[\Lambda = \begin{bmatrix} \lambda_{1} & 0 & \cdots & 0\\ 0 & \lambda_{2} & & \vdots\\ \vdots & & \ddots & \vdots\\ 0 & \cdots & \cdots & \lambda_{n}\end{bmatrix}\quad\text{and}\quad \Lambda^{k} = \begin{bmatrix} \lambda_{1}^{k} & 0 & \cdots & 0\\ 0 & \lambda_{2}^{k} & & \vdots\\ \vdots & & \ddots & \vdots\\ 0 & \cdots & \cdots & \lambda_{n}^{k}\end{bmatrix}\]
If we assume \(|\lambda_{1}|> |\lambda_{2}|\geq\cdots\geq |\lambda_{n}|\), then we have
\[\frac{A^{k}}{\lambda_{1}^{k}} = X\begin{bmatrix} 1 & 0 & \cdots & 0\\ 0 & (\lambda_{2}/\lambda_{1})^{k} & & \vdots\\ \vdots & & \ddots & \vdots\\ 0 & \cdots & \cdots & (\lambda_{n}/\lambda_{1})^{k}\end{bmatrix} X^{-1} \approx X\begin{bmatrix} 1 & 0 & \cdots & 0\\ 0 & 0 & & \vdots\\ \vdots & & \ddots & \vdots\\ 0 & \cdots & \cdots & 0\end{bmatrix} X^{-1}\]
When \(k\) is very large
Hence, if \(X^{-1}v\) has a nonzero first coordinate, then \(A^{k}v/\lambda_{1}^{k}\) is a approximately a scalar multiple of the first column of \(X.\) That is, \(A^{k}v/\lambda_{1}^{k}\approx \alpha x_{1}\) and hence \(A^{k}v\approx \beta x_{1}.\)
Note that
\[v_{k+1} = \frac{A^{k}v}{\|A^{k}v\|}\approx \pm \frac{x_{1}}{\|x_{1}\|}\]
Hence,
\[v_{k+1}^{\top}Av_{k+1} \approx \left( \pm \frac{x_{1}}{\|x_{1}\|}\right)^{\top}A\left( \pm \frac{x_{1}}{\|x_{1}\|}\right) = \frac{1}{\|x_{1}\|^{2}} x_{1}^{\top}\lambda_{1}x_{1} = \lambda_{1}\]
If \(\lambda\) and \(-\lambda\) are both eigenvalues:
Example. Consider the matrix \(A = \begin{bmatrix} 1 & 0\\ 0 & -1\end{bmatrix}.\)
If we start with a vector \(v_{1}=\begin{bmatrix} \frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}}\end{bmatrix}\), then
\[v_{2} = \frac{Av_{1}}{\|Av_{1}\|^{2}} = \begin{bmatrix} \frac{1}{\sqrt{2}}\\ -\frac{1}{\sqrt{2}}\end{bmatrix},\ v_{3} = \begin{bmatrix} \frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}}\end{bmatrix},\ v_{4} = \begin{bmatrix} \frac{1}{\sqrt{2}}\\ -\frac{1}{\sqrt{2}}\end{bmatrix},\ v_{5} = \begin{bmatrix} \frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}}\end{bmatrix},\ldots\]
Worse yet:
\[v_{2}^{\top}Av_{2} = 0,\ v_{3}^{\top}Av_{3} = 0,\ v_{4}^{\top}Av_{4} = 0,\ldots\]
Example. Add a bit \(A+I = \begin{bmatrix} 2 & 0\\ 0 & 0\end{bmatrix}.\)
If we start with a vector \(v_{1}=\begin{bmatrix} \frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}}\end{bmatrix}\), then
\[v_{2} = \frac{Av_{1}}{\|Av_{1}\|^{2}} = \begin{bmatrix} 1\\ 0\end{bmatrix},\ v_{3} = \begin{bmatrix} 1\\ 0\end{bmatrix},\ v_{4} = \begin{bmatrix} 1\\ 0\end{bmatrix},\ v_{5} = \begin{bmatrix} 1\\ 0\end{bmatrix},\ldots\]
And
\[v_{2}^{\top}Av_{2} = 2,\ v_{3}^{\top}Av_{3} = 2,\ v_{4}^{\top}Av_{4} = 2,\ldots\]
From this we can conclude that \(2-1=1\) is an eigenvalue of \(A\).
Be careful, what if the eigenvalues of \(A\) were \(-2\) and \(0\). We wouldn't know that in advance, and adding \(I\) would create the problem we see here.
Proposition. If \(A\) is a matrix with eigenvalues \(\lambda_{1},\ldots,\lambda_{n}\) and associated eigenvectors \(v_{1},\ldots,v_{n}\), then \(A+cI\) has eigenvalues \(\lambda_{1}+c,\ldots,\lambda_{n}+c\) with the same associated eigenvectors.
Proof.
\[(A+cI)v_{j} = Av_{j} + cIv_{j} = \lambda_{j}v_{j}+cv_{j} = (\lambda_{j}+c)v_{j}.\quad\Box\]
To avoid the problem that \(\lambda\) and \(-\lambda\) are both eigenvalues of \(A\) we first take a random number \(c\), then use the power method to find the eigenvalues of \(A+cI\). Subtracting \(c\) from each of the eigenvalues of \(A+cI\) we obtain the eigenvalues of \(A.\)
Input: Matrix \(A\)
Once this algorithm terminates with \(\texttt{done}\), then \(\texttt{la}\) is (approximately) an eigenvalue and \(\texttt{v}\) is a normalized eigenvector.
Notes: If the algorithm takes too long, you should start over with a new random vector.
If \(A\) has two opposite eigenvalues, then the algorithm may not work. So, we start by adding \(\texttt{c*eye(n)}\) to our matrix \(A\) where \(c\) is a random number, then subtract \(c\) from the eigenvalue we get at the end.