Linear Algebra

Text

Part 1

Fields and vector spaces

Definition 1.1. A field is a set \(\mathbb{F}\) together with two binary operations,

\(+:\mathbb{F}\times\mathbb{F}\to\mathbb{F}\) and \(\cdot:\mathbb{F}\times\mathbb{F}\to\mathbb{F}\), called addition and multiplication, respectively, and usually denoted \(c_{1}+c_{2}:=+(c_{1},c_{2})\) and \(c_{1}c_{2}:=\cdot(c_{1},c_{2})\) for \(c_{1},c_{2}\in\mathbb{F}\), with the following nine properties:

  1. Addition is commutative, that is, \[c_{1}+c_{2} = c_{2}+c_{1}\] for all \(c_{1},c_{2}\in\mathbb{F}\).
  2. Addition is associative, that is, \[c_{1}+(c_{2}+c_{3}) = (c_{1}+c_{2})+c_{3}\] for all \(c_{1},c_{2},c_{3}\in\mathbb{F}\).
  3. There is an additive identity \(0\in\mathbb{F}\), that is, \(0+c=c\) for all \(c\in\mathbb{F}\).
  4. Any \(c\in\mathbb{F}\) has an additive inverse \(-c\in\mathbb{F}\) such that \(c+(-c)=0\).
  5. Multiplication is commutative, that is, \[c_{1}c_{2}=c_{2}c_{1}\] for all \(c_{1},c_{2}\in\mathbb{F}\).
  1.  
  2.  
  3.  
  4.  
  5.  
  6. Multiplication is associative, that is, \[c_{1}(c_{2}c_{3}) = (c_{1}c_{2})c_{3}\] for all \(c_{1},c_{2},c_{3}\in\mathbb{F}\).
  7. There exists a multiplicative identity \(1\in\mathbb{F}\) such that \(1c=c\) for all \(c\in\mathbb{F}\).
  8. Any \(c\in\mathbb{F}\) with \(c\neq 0\) has a multiplicative inverse \(c^{-1}\) such that \(cc^{-1}=1\).
  9. Multiplication distributes over addition, that is, \[c_{1}(c_{2}+c_{3}) = c_{1}c_{2}+c_{1}c_{3}\] for all \(c_{1},c_{2},c_{3}\in\mathbb{F}\).

Notation: For \(a,b\in\mathbb{F}\) e will adopt the following conventions:

  • If \(b\neq 0\), then \(\dfrac{a}{b}:=ab^{-1}\)
  • \(a-b = a+(-b)\)

Example 1. The set of natural numbers \(\mathbb{N}:=\{1,2,3,\ldots\}\) with usual addition and multiplication is not a field. 

Example 2. The set of integers \(\mathbb{Z}:=\{\ldots,-3,-2,-1,0,1,2,3,\ldots\}\) with usual addition and multiplication is not a field. 

Example 3. The set of rational numbers \[\mathbb{Q}:=\left\{\frac{a}{b} : a,b\in\mathbb{Z},b\neq 0\right\}\] with usual addition and multiplication is a field.

Example 4. The real numbers, dentoed \(\mathbb{R}\), and complex numbers, denoted \(\mathbb{C}\), are both fields under usual addition and multiplication.

Example 6. If \(n\in\N\), then let \(\mathbb{Z}_{n} = \{0,1,2,\ldots,n-1\}\) with addition and multiplication modulo \(n\). (See Example 4.4)

  • If \(p\in\N\) is prime, then \(\mathbb{Z}_{p}\) is a field.
  • If \(n\in\N\) is not prime, then \(\mathbb{Z}_{n}\) is not a field. For example, \(\mathbb{Z}_{4}\) is not a field since \[0\cdot 2=0,\quad 1\cdot 2=2,\quad,2\cdot 2=0,\quad 3\cdot 2=2\] so \(2\) does not have a multiplicative inverse.

Example 7. The set \(\{0,1,\alpha,\beta\}\) with addition and multiplication tables

\[\begin{array}{c|cccc} +&0&1&\alpha&\beta\\\hline 0&0&1&\alpha&\beta\\1&1&0&\beta&\alpha\\\alpha&\alpha&\beta&0&1\\\beta&\beta&\alpha&1&0\end{array}\quad\quad \begin{array}{c|cccc} \cdot&0&1&\alpha&\beta\\\hline 0&0&0&0&0\\1&0&1&\alpha&\beta\\\alpha&0&\alpha&\beta&1\\\beta&0&\beta&1&\alpha\end{array}\]

is a field. We will denote this field as \(\mathbb{F}_{4}\).

Definition 1.2. A vector space is a set \(\mathcal{V}\) (whose elements are called vectors) together with a field \(\mathbb{F}\) (whose elements are called scalars) along with two binary operations, \(+:\mathcal{V}\times\mathcal{V}\to\mathcal{V}\) and \(\cdot:\mathbb{F}\times\mathcal{V}\to\mathcal{V}\), called (vector) addition and scalar multiplication, respectively, and usually denoted

\(\mathbf{v}_{1}+\mathbf{v}_{2}:=+(\mathbf{v}_{1},\mathbf{v}_{2})\) and \(c\mathbf{v}_{1}:=\cdot(c,\mathbf{v}_{1})\) for  \(\mathbf{v}_{1}, \mathbf{v}_{2}\in\mathcal{V}\) and \(c\in\mathbb{F},\) with the following eight properties:

  1. Addition is commutative, that is, \[\mathbf{v}_{1}+\mathbf{v}_{2} = \mathbf{v}_{2}+\mathbf{v}_{1}\] for all \(\mathbf{v}_{1},\mathbf{v}_{2}\in\mathcal{V}\).
  2. Addition is associative, that is, \[\mathbf{v}_{1}+(\mathbf{v}_{2}+\mathbf{v}_{3}) = (\mathbf{v}_{1}+\mathbf{v}_{2})+\mathbf{v}_{3}\] for all \(\mathbf{v}_{1},\mathbf{v}_{2},\mathbf{v}_{3}\in\mathcal{V}\).
  3. There is an additive identity \(\mathbf{0}\in\mathcal{V}\), that is, \(\mathbf{0}+\mathbf{v}=\mathbf{v}\) for all \(\mathbf{v}\in\mathcal{V}\).
  4. Any \(\mathbf{v}\in\mathcal{V}\) has an additive inverse \(-\mathbf{v}\in\mathcal{V}\) such that \(\mathbf{v}+(-\mathbf{v})=0\).
  5. Scalar multiplication is associative, \[c_{1}(c_{2}\mathbf{v}) = (c_{1}c_{2})\mathbf{v}\] for \(c_{1},c_{2}\in\mathbb{F}\) and \(\mathbf{v}\in\mathcal{V}\).
  1.  
  2.  
  3.  
  4.  
  5.  
  6. If \(1\in\mathbb{F}\) is the multiplicative identity, then \(1\mathbf{v}=\mathbf{v}\) for all \(\mathbf{v}\in\mathcal{V}\).
  7. Multiplication distributes over addition, \[ c(\mathbf{v}_{1}+\mathbf{v}_{2}) = c\mathbf{v}_{1}+c\mathbf{v}_{2}\] for \(c\in\mathbb{F}\) and \(\mathbf{v}_{1},\mathbf{v}_{2}\in\mathcal{V}\).
  8. Addition distributes over multiplication: \[(c_{1}+c_{2})\mathbf{v} = c_{1}\mathbf{v}+c_{2}\mathbf{v}\] for \(c_{1},c_{2}\in\mathbb{F}\) and \(\mathbf{v}\in\mathcal{V}\).

Example 8. If \(\mathbb{F}\) is a field, then \(\mathbb{F}\) is a vector space over itself. Vector addition and scalar multiplication are addition and multiplication in the field, respectively. 

Example 9. The complex numbers \(\mathbb{C}\) are a vector space over the real numbers \(\mathbb{R}\).

Example 10. Set

\[\mathbb{R}^{3}:=\left\{\begin{bmatrix}\mathbf{v}(1)\\ \mathbf{v}(2)\\ \mathbf{v}(3)\end{bmatrix} : \mathbf{v}(1),\mathbf{v}(2),\mathbf{v}(3)\in\mathbb{R}\right\}.\]

For \(\mathbf{v},\mathbf{v}_{1},\mathbf{v}_{2}\in\mathbb{R}^{3}\) and \(c\in\mathbb{R}\) define addition by

\[\mathbf{v}_1+\mathbf{v}_2=\begin{bmatrix}\mathbf{v}_1(1)\\\mathbf{v}_1(2)\\\mathbf{v}_1(3)\end{bmatrix}+\begin{bmatrix}\mathbf{v}_2(1)\\\mathbf{v}_2(2)\\\mathbf{v}_2(3)\end{bmatrix}:=\begin{bmatrix}\mathbf{v}_1(1)+\mathbf{v}_2(1)\\\mathbf{v}_1(2)+\mathbf{v}_2(2)\\\mathbf{v}_1(3)+\mathbf{v}_2(3)\end{bmatrix}\]

and scalar multiplication by

\[c\mathbf{v}=c\begin{bmatrix}\mathbf{v}(1)\\\mathbf{v}(2)\\\mathbf{v}(3)\end{bmatrix} :=\begin{bmatrix}c\mathbf{v}(1)\\c\mathbf{v}(2)\\c\mathbf{v}(3)\end{bmatrix}.\]

This is a vector space over \(\mathbb{R}\). 

Example 11. More generally, let \(\mathcal{N}\) be a nonempty set and let \(\mathbb{F}\) be a field. The set of functions from \(\mathcal{N}\) to \(\mathbb{F}\) is denoted

\[\mathbb{F}^{\mathcal{N}}:=\{\mathbf{x}:\mathcal{N}\to \mathbb{F}\}.\]

For \(\mathbf{x}_{1},\mathbf{x}_{2}\in\mathbb{F}^{\mathcal{N}},\) define the sum \(\mathbf{x}_{1}+\mathbf{x}_{2}\in\mathbb{F}^{\mathcal{N}}\) by

\[(\mathbf{x}_{1}+\mathbf{x}_{2})(n) = \mathbf{x}_{1}(n)+\mathbf{x}_{2}(n),\quad\text{for all }n\in\mathcal{N}\]

For \(\mathbf{x}\in\mathbb{F}^{\mathcal{N}}\) and \(c\in\mathbb{F}\), define the product \(c\mathbf{x}\in\mathbb{F}^{\mathcal{N}}\) by

\[(c\mathbf{x})(n) = c\mathbf{x}(n)\quad\text{for all }n\in\mathcal{N}.\]With these two operations \(\mathbb{F}^{\mathcal{N}}\) is a vector space over \(\mathbb{F}\). (But we need to prove this!)

When \(\mathcal{N} = [N]:=\{1,2,\ldots,N\}\), we usually write \(\mathbb{F}^{N}\) instead of \(\mathbb{F}^{[N]}\). Hence, if \(\mathbf{v}\in\mathbb{R}^{3}\), then \(\mathbf{v}\) is a function from \([3]=\{1,2,3\}\to\mathbb{R}\). This function is completely determined by the three numbers \(\mathbf{v}(1),\mathbf{v}(2),\mathbf{v}(3)\), so we often express this as a \(3\times 1\) column vector.

\[\mathbf{v}=\begin{bmatrix}\mathbf{v}(1)\\ \mathbf{v}(2)\\ \mathbf{v}(3)\end{bmatrix}\]

Theorem 1.5. For any nonempty set \(\mathcal{N}\) and a field \(\mathbb{F}\), the set

\(\mathbb{F}^{\mathcal{N}}:=\{\mathbb{x}:\mathcal{N}\to\mathbb{F}\}\) of all functions from \(\mathcal{N}\) to \(\mathbb{F}\) is a vector space under pointwise addition and scalar multiplication.

 

Proof. (On the board and in the notes.)

Theorem 1.6. If \(\mathcal{V}\) is a vector space over a field \(\mathbb{F}\), then:

  1. The additive identity (zero vector) and additive inverses are unique.
  2. The scalar multiplicative identity (unit scalar) and scalar multiplicative inverses are unique.
  3. For any \(c\in\mathbb{F}\) and \(\mathbf{v}\in\mathcal{V}\) we have \(c\mathbf{v}=\mathbf{0}\) if and only if \(c=0\) or \(\mathbf{v} = \mathbf{0}\).
  4. \((-1)\mathbf{v}=-\mathbf{v}\) for all \(\mathbf{v}\in\mathcal{V}\). 

 

Proof. (On the board and in the notes.)

Part 2

Subspaces

Definition 1.7. Let \(\mathcal{V}\) be a vector space over a field \(\mathbb{F}\). A subset \(\mathcal{U}\) is called a subspace if \(\mathbf{0}\in\mathcal{U}\), and \(c_{1}\mathbf{u}_{1}+c_{2}\mathbf{u}_{2}\in\mathcal{U}\) for all \(c_{1},c_{2}\in\mathbb{F}\) and \(\mathbf{u}_{1},\mathbf{u}_{2}\in\mathcal{U}\).

 

Note: We require that \(\mathbf{0}\in\mathcal{U}\), to avoid the case that \(\mathcal{U}=\varnothing\). Indeed, it is enough to assume that \(\mathcal{U}\) is nonempty, that is, there is some \(\mathbf{u}\in\mathcal{U}\), since in this case we have

\[\mathbf{0} = 0\mathbf{u} + 0\mathbf{u} \in\mathcal{U}.\]

Example 1.  The set \(\mathcal{U}:=\{(x,x) : x\in\mathbb{R}\}\) is a subspace of \(\mathbb{R}^{2}\). Indeed, \((0,0)\in\mathcal{U}\), and for any \(c_{1},c_{2},x,y\in\mathbb{R}\), then \[c_{1}(x,x)+c_{2}(y,y) = (c_{1}x+c_{2}y,c_{1}x+c_{2}y).\]

Example 2.  The set \(\mathcal{U}:=\{(x,y)\in\mathbb{R}^{2} : xy=0\}\) is a not a subspace of \(\mathbb{R}^{2}\). Note that \((0,0)\in\mathcal{U}\), however, \((1,0)\in\mathcal{U}\) and \((0,1)\in\mathcal{U}\), but \[1(1,0)+1(0,1)=(1,1)\notin\mathcal{U}.\]

Theorem 1.8. Let \(\mathcal{U}\) be a subset of a vector space \(\mathcal{V}\) over a field \(\mathbb{F}\). The following are equivalent:

  1. \(\mathcal{U}\) is a subspace of \(\mathcal{V}\).
  2. \(\mathcal{U}\) has the following properties:                                                                               

(a) \(\mathbf{0}\in\mathcal{U}\)

(b) \(\mathbf{u}_{1}+\mathbf{u}_{2}\in\mathcal{U}\) for all \(\mathbf{u}_{1},\mathbf{u}_{2}\in\mathcal{U}\)

(c) \(c\mathbf{u}\in\mathcal{U}\) for all \(c\in\mathbb{F}\) and \(\mathbf{u}\in\mathcal{U}\)

  1.  
  2.  
  3. \(\mathcal{U}\) is itself a vector space over \(\mathbb{F}\) under the definitions of addition, scalar multiplication, additive identity, and additive inverses that it inherits from \(\mathcal{V}\).

 

Proof. (On the board and in the notes)

Example 3. The set \(\mathcal{U}: = \{f:[0,1]\to\mathbb{R}: f(0)=1\}\) is not a subspace of \(\mathbb{R}^{[0,1]}\). Indeed, if \(f,g\in\mathcal{U}\), then \((f+g)(0)=2\), and hence \(f+g\notin\mathcal{U}.\)

Definition 1.10. If \(\mathcal{N}\) is a finite nonempty set, and \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is a sequence of vectors in a vector space \(\mathcal{V}\) over a field \(\mathbb{F}\), then a linear combination of \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is any vector of the form\[\sum_{n\in\mathcal{N}}c_{n}\mathbf{v}_{n}\]where \(\{c_{n}\}_{n\in\mathcal{N}}\) is a sequence of scalars in \(\mathbb{F}\).

Notes:

  • If \(\mathcal{N}=[N]\) for some \(N\in\N\), then we usually write \(\{\mathbf{v}_{n}\}_{n=1}^{N}\) instead of \(\{\mathbf{v}_{n}\}_{n\in[N]}\).
  • Since the \(c_{n}\)'s can equal zero, not all terms \(\mathbf{v}_{n}\) must appear in the sum for it to be a linear combination. For example, each of 

\[\begin{bmatrix} 1\\ 0\\ 0\end{bmatrix} = 1\begin{bmatrix} 1\\ 0\\ 0\end{bmatrix},\quad\begin{bmatrix} 1\\ 2\\ 2\end{bmatrix} = 1\begin{bmatrix} 1\\ 0\\ 0\end{bmatrix}+2\begin{bmatrix} 0\\ 1\\ 0\end{bmatrix} +2\begin{bmatrix} 0\\ 0\\ 1\end{bmatrix}\quad\text{and}\quad \begin{bmatrix} 0\\ 1\\ -1\end{bmatrix} = \begin{bmatrix} 0\\ 1\\ 0\end{bmatrix} - \begin{bmatrix} 0\\ 0\\ 1\end{bmatrix} \]

is a linear combination of \(\left\{\begin{bmatrix}1\\0\\0\end{bmatrix},\begin{bmatrix}0\\1\\0\end{bmatrix},\begin{bmatrix}0\\0\\1\end{bmatrix}\right\}\)

Part 3

Bases and dimension

Definition 5.4. Let \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) be a finite sequence of vectors in a vector space \(\mathcal{V}\) over a field \(\mathbb{F}\). 

  1. The span of \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is the set of all linear combinations of \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\), that is, \[\operatorname{span}\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}:=\left\{\sum_{n\in\mathcal{N}}\mathbf{x}(n)\mathbf{v}_{n} : \mathbf{x}\in\mathbb{F}^{\mathcal{N}}\right\}\]
  2. \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is linearly independent if \(\mathbf{x}=\mathbf{0}\) is the only \(\mathbf{x}\in\mathbb{F}^{\mathcal{N}}\) such that \[\mathbf{0} = \sum_{n\in\mathcal{N}}\mathbf{x}(n)\mathbf{v}_{n}.\] If \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is not linearly independent, then it is linearly dependent.
  3. \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is a basis for \(\mathcal{V}\) if \(\operatorname{span}\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}=\mathcal{V}\) and \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is linearly independent.

Definition. [Slightly different from the notes] A vector space \(\mathcal{V}\) is called finite-dimensional if \(\mathcal{V}\) has a basis which is a finite sequence. If a vector space is not finite-dimensional, then it is infinite dimensional.

Do the vectors span the space?

Suppose \(\mathcal{N}\) is a finite nonempty set, and \(\mathcal{V}\) is a vector space over a field \(\mathbb{F}.\)

 

If \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is a sequence in \(\mathcal{V}\) and \(\operatorname{span}\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}=\mathcal{V}\), then we say that \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) spans \(\mathcal{V}\).

Do the vectors \(\begin{bmatrix} 1\\ 0\\ 0\end{bmatrix}, \begin{bmatrix} 0\\ 1\\ 1\end{bmatrix}\) in \(\mathbb{R}^{3}\) span the subspace \(\mathcal{W} = \left\{\begin{bmatrix} c\\ c\\ c\end{bmatrix}:c\in\mathbb{R}\right\}\)?

No! Recall that \(\mathcal{W}\) is a vector space in its own right. To say that a sequence of vectors span \(\mathcal{W}\), the vectors must be in \(\mathcal{W}\)!

Example 1. Consider the sequence \(\{\mathbf{v}_{n}\}_{n=1}^{3}\) in \(\mathbb{R}^{3}\) where

\[\mathbf{v}_{1} = \begin{bmatrix} 1\\ 0\\ 0\end{bmatrix},\quad \mathbf{v}_{2} = \begin{bmatrix} 0\\ 1\\ 0\end{bmatrix},\quad \mathbf{v}_{3} = \begin{bmatrix} 0\\ 0\\ 1\end{bmatrix}\]

We claim that these vectors form a basis. If \(\mathbf{x}\in\mathbb{R}^{3}\) is a vector such that

\[\begin{bmatrix} 0\\ 0\\ 0\end{bmatrix}=\mathbf{x}(1)\mathbf{v}_{1} + \mathbf{x}(2)\mathbf{v}_{2}+\mathbf{x}(3)\mathbf{v}_{3}= \begin{bmatrix}\mathbf{x}(1)\\ \mathbf{x}(2)\\ \mathbf{x}(3)\end{bmatrix}, \]

then clearly \(\mathbf{x}(1)=\mathbf{x}(2)=\mathbf{x}(3)=0,\) that is \(\mathbf{x}=\mathbf{0}\). This shows that \(\{\mathbf{v}_{n}\}_{n=1}^{3}\) is independent.

Next, let \(\mathbf{v}\in\mathbb{R}^{3}\) be arbitrary. Then we have

\[\mathbf{v} = \begin{bmatrix}\mathbf{v}(1)\\ \mathbf{v}(2)\\ \mathbf{v}(3)\end{bmatrix} = \mathbf{v}(1)\begin{bmatrix} 1\\ 0\\ 0\end{bmatrix}+\mathbf{v}(2)\begin{bmatrix} 0\\ 1\\ 0\end{bmatrix}+\mathbf{v}(3)\begin{bmatrix} 0\\ 0\\ 1\end{bmatrix}\]

 

 

 

\[=\mathbf{v}(1)\mathbf{v}_{1}+\mathbf{v}(2)\mathbf{v}_{2}+\mathbf{v}(3)\mathbf{v}_{3}\]

This shows that \(\operatorname{span}\{\mathbf{v}_{n}\}_{n=1}^{3} = \mathbb{R}^{3}\). Thus, \(\{\mathbf{v}_{n}\}_{n=1}^{3} \) is a basis for \(\mathbb{R}^{3}\).

Example 2. Consider the sequence \(\{\mathbf{v}_{n}\}_{n=1}^{3}\) in \(\mathbb{R}^{2}\) where

\[\mathbf{v}_{1} = \begin{bmatrix} 1\\ 0\end{bmatrix},\quad \mathbf{v}_{2} = \begin{bmatrix} 0\\ 1\end{bmatrix},\quad \mathbf{v}_{3} = \begin{bmatrix} 1\\ -2\end{bmatrix}.\]

Given arbitrary \(\mathbf{v}\in\mathbb{R}^{2}\) we note that

\[\mathbf{v} = \begin{bmatrix}\mathbf{v}(1)\\ \mathbf{v}(2)\end{bmatrix} = \mathbf{v}(1)\mathbf{v}_{1} + \mathbf{v}(2)\mathbf{v}_{2}.\]

This shows that \(\operatorname{span}\{\mathbf{v}_{n}\}_{n=1}^{3} = \mathbb{R}^{2}\). However, 

\[\mathbf{v}_{1} - 2\mathbf{v}_{2} - \mathbf{v}_{3} = \mathbf{0},\]

hence \(\{\mathbf{v}_{n}\}_{n=1}^{3}\) is not linearly independent, and hence it is not a basis for \(\mathbb{R}^{2}\).

Example 3.  Consider the sequence \(\{\mathbf{v}_{n}\}_{n=1}^{2}\) in \(\mathbb{R}^{3}\) where

\[\mathbf{v}_{1} = \begin{bmatrix} 1\\ 0\\ 0\end{bmatrix},\quad \mathbf{v}_{2} = \begin{bmatrix} 0\\ 1\\ 0\end{bmatrix}.\]

This sequence is independent, but \(\operatorname{span}\{\mathbf{v}_{n}\}_{n=1}^{2} \neq \mathbb{R}^{3}\), so \(\{\mathbf{v}_{n}\}_{n=1}^{2}\) is not a basis for \(\mathbb{R}^{3}\). Can you prove this?

Theorem. Suppose \(\mathcal{V}\) is a vector space over a field \(\mathbb{F}\) and \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is a finite sequence in \(\mathcal{V}\). Define the function \(\mathbf{V}:\mathbb{F}^{\mathcal{N}}\to\mathcal{V}\) by \[\mathbf{V}(\mathbf{x}) = \sum_{n\in\mathcal{N}}\mathbf{x}(n)\mathbf{v}_{n}.\] This function is called the synthesis operator of \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\).

(a) The synthesis operator is injective if and only if \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is independent.

(b) The synthesis operator is surjective if and only if \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) spans \(\mathcal{V}\).

Proof. Suppose \(\mathbf{V}\) is injective. This implies that there is a unique \(\mathbf{x}\in\mathbb{F}^{\mathcal{N}}\) such that \(\mathbf{V}(\mathbf{x}) = \mathbf{0}\). This is exactly the statement that \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is independent.

Suppose \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is independent. To show that \(\mathbf{V}\) is injective we assume there exist \(\mathbf{x},\mathbf{y}\in\mathbb{F}^{\mathcal{N}}\) such that \(\mathbf{V}(\mathbf{x})=\mathbf{V}(\mathbf{y})\). Then,

\[\mathbf{0} = \mathbf{V}(\mathbf{x})-\mathbf{V}(\mathbf{y})=\sum_{n\in\mathcal{N}}\mathbf{x}(n)\mathbf{v}_{n} - \sum_{n\in\mathcal{N}}\mathbf{y}(n)\mathbf{v}_{n} = \sum_{n\in\mathcal{N}}(\mathbf{x}(n)-\mathbf{y}(n))\mathbf{v}_{n}.\]

(c) The synthesis operator is bijective if and only if \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is a basis for \(\mathcal{V}\).

Proof continued. From the assumption that \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is independent we deduce that \(\mathbf{x}(n)-\mathbf{y}(n)=0\) for each \(n\in\mathcal{N}\), that is \(\mathbf{x}=\mathbf{y}\). This shows that \(\mathbf{V}\) is injective.

Note that \(\mathbf{V}\) surjective if and only if for each \(\mathbf{v}\in\mathcal{V}\) there exists \(\mathbf{x}\in\mathbb{F}^{\mathcal{N}}\) such that \(\mathbf{V}(\mathbf{x}) = \mathbf{v}\). This is exactly the statement that \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) spans \(\mathcal{V}\).

From parts (a) and (b) we see that \(\mathbf{V}\) is bijective if and only if \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) both spans \(\mathcal{V}\) and is linearly independent, that is, \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is a basis. \(\Box\)

Note: Suppose \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\)  is linearly independent. The preceeding proof shows that for any \(\mathbf{v}\in\operatorname{span}\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) there is a unique \(\mathbf{x}\in\mathbb{F}^{\mathcal{N}}\) such that \(\mathbf{V}(\mathbf{x})=\mathbf{v}.\) That is, there exists a unique sequence of scalars \(\{\mathbf{x}(n)\}_{n\in\mathcal{N}}\) such that \[ \mathbf{v} = \sum_{n\in\mathcal{N}}\mathbf{x}(n)\mathbf{v}_{n}.\] If, in addition, \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) spans \(\mathcal{V}\), then this holds for any vector \(\mathbf{v}\in\mathcal{V}\).

Every vector in a vector space is a unique linear combination of the vectors in a basis

Example. Consider the sequence \(\{\mathbf{v}_{n}\}_{n=1}^{3}\) in \(\mathbb{R}^{2}\), where

\[\mathbf{v}_{1} = \begin{bmatrix} 1\\ 1\end{bmatrix},\quad\mathbf{v}_{2} = \begin{bmatrix} 1\\ 0\end{bmatrix},\quad\mathbf{v}_{3} = \begin{bmatrix} 0\\ -2\end{bmatrix}\]

The synthesis operator of \(\{\mathbf{v}_{n}\}_{n=1}^{3}\) is the function \(\mathbf{V}:\mathbb{R}^{3}\to\mathbb{R}^{2}\) given by

\[\mathbf{V}\left(\begin{bmatrix} \mathbf{x}(1)\\ \mathbf{x}(2)\\ \mathbf{x}(3)\end{bmatrix}\right) = \mathbf{x}(1)\begin{bmatrix} 1\\ 1\end{bmatrix}+ \mathbf{x}(2)\begin{bmatrix} 1\\ 0\end{bmatrix}+ \mathbf{x}(3)\begin{bmatrix} 0\\ -2\end{bmatrix} \]

Since \(\{\mathbf{v}_{n}\}_{n=1}^{3}\) spans \(\mathbb{R}^{2}\) (Prove it!), we conclude that \(\mathbf{V}\) is surjective.

 

On the other hand \(\{\mathbf{v}_{n}\}_{n=1}^{3}\) is dependent (Prove it!), and hence \(\mathbf{V}\) is not injective. Can you find two vectors in \(\mathbb{R}^{3}\) that \(\mathbf{V}\) maps to \(\mathbf{0}\)?

We know that the vector space \(\mathbb{R}^{2}\) is \(2\)-dimensional, but why?

What is dimension?

Intuitively, there are two directions that one can move in \(\mathbb{R}^{2}\)...

...and, to reach all points in \(\R^{2}\) one must move in two directions.

More formally, there are two linearly independent vectors in \(\R^{2}\)

...and, those two vectors span \(\R^{2}\)

There is a basis for \(\R^{2}\) containing two vectors, so we say that the dimension of \(\R^{2}\) is 2.

Is there a basis for \(\R^{2}\) with three vectors???

Lemma (HW2, Problem 2) Assume \(\mathcal{V}\) is a vector space over a field \(\mathbb{F}\), \(\mathcal{N}\) is a finite nonempty set, and \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is a sequence of vectors in \(\mathcal{V}\). Fix \(n_{0}\in\mathcal{N}\) and define the set
\[\mathcal{N}_{0} := \mathcal{N}\setminus\{n_{0}\} = \{n\in\mathcal{N} : n\neq n_{0}\}.\] If \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}_{0}}\) is linearly independent and \(\mathbf{v}_{n_{0}}\in\mathcal{V}\setminus\operatorname{span}\{\mathbf{v}_{n}\}_{n\in\mathcal{N}_{0}}\), then \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is linearly independent.

Lemma (HW2, Problem 3) Assume \(\mathcal{V}\) is a vector space over a field \(\mathbb{F}\), \(\mathcal{N}\) is a finite nonempty set, and \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is a sequence of vectors in \(\mathcal{V}\). If \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is a dependent set, then there exists some \(n_{0}\in\mathcal{N}\) such that
\[\operatorname{span}\{\mathbf{v}_{n}\}_{n\in\mathcal{N}\setminus\{n_{0}\}} = \operatorname{span}\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}.\]

Corollary.  With the same assumptions as the above lemma, if \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}_{0}}\) is linearly independent and \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is linearly dependent, then \[\mathbf{v}_{n_{0}}\in\operatorname{span}\{\mathbf{v}_{n}\}_{n\in\mathcal{N}_{0}}.\]

Lemma 1. Let \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) and \(\{\mathbf{v}_{m}\}_{m\in\mathcal{M}}\) be finite sequences in \(\mathcal{V}\) with     \(\mathcal{M}\cap\mathcal{N}=\varnothing\). If \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is linearly independent, and \(\{\mathbf{v}_{m}\}_{m\in\mathcal{M}}\) spans \(\mathcal{V}\), then there exists a subset \(\mathcal{M}_{0}\subset \mathcal{M}\) such that \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}\cup\mathcal{M}_{0}}\) is a basis for \(\mathcal{V}.\)

Proof.  Consider the set
\[\mathcal{I}:=\{\#(\mathcal{L}) : \mathcal{L}\subset\mathcal{M} \text{ and } \{\mathbf{v}_{n}\}_{n\in\mathcal{N}\cup\mathcal{L}}\text{ is independent}\}.\] Since \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is independent, we see that \(0\in\mathcal{I}\), and hence \(\mathcal{I}\) is nonempty. We also see that \(\mathcal{I}\) is bounded above by \(\#(\mathcal{M})\). Let \(M\) be the largest integer in \(\mathcal{I}\). Let \(\mathcal{M}_{0}\subset\mathcal{M}\) be such that \(\#(\mathcal{M}_{0})=M\) and \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}\cup\mathcal{M}_{0}}\) is independent.

We will show that \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}\cup\mathcal{M}_{0}}\) spans \(\mathcal{V}\), but we will begin by showing that \(\mathbf{v}_{m}\in\operatorname{span}\{\mathbf{v}_{n}\}_{n\in\mathcal{N}\cup\mathcal{M}_{0}}\) for each \(m\in\mathcal{M}\).

Proof continued. Let \(m_{0}\in\mathcal{M}\) be arbitrary. If \(m_{0}\in\mathcal{M}_{0}\) then clearly \(\mathbf{v}_{m_{0}}\in\operatorname{span}\{\mathbf{v}_{n}\}_{n\in\mathcal{N}\cup\mathcal{M}_{0}}.\) Suppose \(m_{0}\notin\mathcal{M}_{0}\), and thus \(\#(\mathcal{M}_{0}\cup\{m_{0}\})=M+1\). The maximality of \(M\) implies that \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}\cup\mathcal{M}_{0}\cup\{m_{0}\}}\) is dependent. By HW 2 Problem 2, we conclude that \(\mathbf{v}_{m_{0}}\in\operatorname{span}\{\mathbf{v}_{n}\}_{n\in\mathcal{N}\cup\mathcal{M}_{0}}\).

Therefore, \(\mathbf{v}_{m_{0}}\in\operatorname{span}\{\mathbf{v}_{n}\}_{n\in\mathcal{N}\cup\mathcal{M}_{0}}\) for each \(m_{0}\in\mathcal{M}\). From this we deduce that
\[\operatorname{span}\{\mathbf{v}_{n}\}_{n\in\mathcal{N}\cup\mathcal{M}_{0}}\supset\operatorname{span}\{\mathbf{v}_{m}\}_{m\in\mathcal{M}} = \mathcal{V},\] and therefore \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}\cup\mathcal{M}_{0}}\) is a basis for \(\mathcal{V}\). \(\Box\)

Corollary.  If \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) spans \(\mathcal{V}\), then there exists \(\mathcal{N}_{0}\subset \mathcal{N}\) such that \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}_{0}}\) is a basis for \(\mathcal{V}\).

Every (finite) spanning sequence contains a basis.

Lemma 2.  Let \(\mathcal{V}\) be a finite-dimensional vector space over a field \(\mathbb{F}\). If \(\{\mathbf{v}_{n}\}_{n=1}^{N}\) is a basis for \(\mathcal{V}\) and \(\mathbf{y},\mathbf{z}\in\mathcal{V}\), then the sequence \(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{N-1},\mathbf{y},\mathbf{z}\) is linearly dependent.

Proof. Since \(\{\mathbf{v}_{n}\}_{n=1}^{N}\) is a basis, there are sequences of scalars \(\{\alpha_{n}\}_{n=1}^{N}\) and \(\{\beta_{n}\}_{n=1}^{N}\) such that \[\mathbf{y} = \sum_{n=1}^{N}\alpha_{n}\mathbf{v}_{n} \quad\text{and}\quad\mathbf{z} = \sum_{n=1}^{N}\beta_{n}\mathbf{v}_{n}.\] If either \(\alpha_{N}=0\) or \(\beta_{N} = 0\), then we see that either \(\mathbf{y}\) or \(\mathbf{z}\) is in \(\operatorname{span}\{\mathbf{v}_{n}\}_{n=1}^{N-1},\) and thus the desired sequence is dependent. Thus, we may assume \(\alpha_{N}\neq 0\) and \(\beta_{N}\neq 0\). Observe that

\[\alpha_{N}\mathbf{z} - \beta_{N}\mathbf{y} = \sum_{n=1}^{N-1}(\alpha_{N}\beta_{n} - \beta_{N}\alpha_{n})\mathbf{v}_{n} \in\operatorname{span}\{\mathbf{v}_{n}\}_{n=1}^{N-1},\] and thus \[ \sum_{n=1}^{N-1}(\alpha_{N}\beta_{n} - \beta_{N}\alpha_{n})\mathbf{v}_{n} - \alpha_{N}\mathbf{z} + \beta_{N}\mathbf{y} =\mathbf{0}.\] Since \(\alpha_{N}\) and \(\beta_{N}\) are nonzero, this shows that the sequence \(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{N-1},\mathbf{y},\mathbf{z}\) is linearly dependent. \(\Box\)

Theorem (the basis theorem). If \(\{\mathbf{v}_{n}\}_{n=1}^{N}\) and \(\{\mathbf{w}_{m}\}_{m=1}^{M}\) are both bases for \(\mathcal{V}\), then \(N=M\).

Proof. Suppose that \(N>M\). The sequence\(\{\mathbf{v}_{n}\}_{n=1}^{N-1}\) is linearly independent, and \(\{\mathbf{w}_{m}\}_{m=1}^{M}\) spans \(\mathcal{V}\). Combining Lemmas 1 and 2 we see that there is a single element \(m_{N}\in[M]\) such that the sequence

\[\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{N-1},\mathbf{w}_{m_{N}}\] is a basis for \(\mathcal{V}\).

Next, we see that \(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{N-2},\mathbf{w}_{m_{N}}\) is independent, hence we can find \(m_{N-1}\in[M]\) such that \[\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{N-2},\mathbf{w}_{m_{N-1}},\mathbf{w}_{m_{N}}\] is a basis for \(\mathcal{V}\). Carry out this procedure \(M\) times, and we obtain a basis of the form \[\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{N-M},\mathbf{w}_{m_{N-M+1}},\mathbf{w}_{m_{N-M+2}},\ldots,\mathbf{w}_{m_{N}}.\]

Clearly none of the \(M\) terms in the sequence \(\mathbf{w}_{m_{N-M+1}},\mathbf{w}_{m_{N-M+2}},\ldots\mathbf{w}_{m_{N}}\) is repeated, thus this is just a reordered version of the basis \(\{\mathbf{w}_{m}\}_{m=1}^{M}\). Since \(N>M\) we conclude that \(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{N-M},\mathbf{w}_{m_{N-M+1}},\mathbf{w}_{m_{N-M+2}},\ldots\mathbf{w}_{m_{N}}\) is dependent. This contradiction shows that our initial assumption that \(N>M\) is false. \(\Box\)

Theorem. If \(\{\mathbf{v}_{n}\}_{n=1}^{N}\) and \(\{\mathbf{w}_{m}\}_{m=1}^{M}\) are both bases for \(\mathcal{V}\), then \(N=M\).

Proof.  For each \(n\in [N]\) there are scalars \(a_{n1},a_{n2},\ldots,a_{nM}\) such that \[\mathbf{v}_{n} = a_{n1}\mathbf{w}_{1} + a_{n2}\mathbf{w}_{2} + \cdots + a_{nM}\mathbf{w}_{M}.\]

Similarly, for each \(m\in[M]\) there are scalars \(b_{m1},b_{m2},\ldots,b_{mN}\) such that \[\mathbf{w}_{m} =b_{m1}\mathbf{v}_{1} + b_{m2}\mathbf{v}_{2} +\cdots +b_{MN}\mathbf{v}_{N}.\]

Combining these, for \(n_{0}\in[N]\) we have

\[\mathbf{v}_{n_{0}} = \sum_{m=1}^{M}a_{n_{0}m}\mathbf{w}_{m} = \sum_{m=1}^{M}a_{n_{0}m}\left(\sum_{n=1}^{N}b_{mn}\mathbf{v}_{n}\right) = \sum_{n=1}^{N}\left(\sum_{m=1}^{M}a_{n_{0}m}b_{mn}\right)\mathbf{v}_{n}.\]

Since \(\{\mathbf{v}_{n}\}_{n=1}^{N}\) is a basis, the representation of \(\mathbf{v}_{n_{0}}\) as a linear combination is unique, and thus \[\sum_{m=1}^{M}a_{n_{0}m}b_{mn} = \begin{cases} 1 & n=n_{0},\\ 0 & n\neq n_{0}.\end{cases}\]

This alternate proof shows works over fields like \(\mathbb{R}\), \(\mathbb{C}\), and \(\mathbb{Q}\), but not necessarily over fields like \(\mathbb{Z}_{2}\) can you figure out why?

\[\mathbf{v}_{n_{0}} = a_{n_{0}1}\mathbf{w}_{1} +  a_{n_{0}2}\mathbf{w}_{2} + \cdots + a_{n_{0}M}\mathbf{w}_{M}\]

\[ = a_{n_{0}1}\left(b_{11}\mathbf{v}_{1} + b_{12}\mathbf{v}_{2} + \cdots + b_{1N}\mathbf{v}_{N}\right) + a_{n_{0}2}\left(b_{21}\mathbf{v}_{1} + b_{22}\mathbf{v}_{2} + \cdots + b_{2N}\mathbf{v}_{N}\right) \]

\[+ \cdots + a_{n_{0}M}\left(b_{M 1}\mathbf{v}_{1} + b_{M 2}\mathbf{v}_{2} + \cdots + b_{M N}\mathbf{v}_{N}\right)\]

\[= \left(a_{n_{0}1}b_{11} + a_{n_{0}2}b_{21} + \cdots a_{n_{0}M}b_{M1}\right) \mathbf{v}_{1} + \left(a_{n_{0}1}b_{12} + a_{n_{0}2}b_{22} + \cdots a_{n_{0}M}b_{M2}\right) \mathbf{v}_{2}\]

\[+ \cdots + \left(a_{n_{0}1}b_{1N}+a_{n_{0}2}b_{2N} + \cdots a_{n_{0}M}b_{MN}\right) \mathbf{v}_{N} \]

\[=\sum_{n=1}^{N}\left(\sum_{m=1}^{M}a_{n_{0}m}b_{mn}\right)\mathbf{v}_{n}\]

If we decompose \(w_{m_{0}}\) similarly, then we find \[\sum_{n=1}^{N}b_{m_{0}n} a_{nm}= \begin{cases} 1 & m=m_{0},\\ 0 & m\neq m_{0}.\end{cases}\]

Hence, \[M =\sum_{m=1}^{M}1 = \sum_{m=1}^{M}\left(\sum_{n=1}^{N}b_{mn}a_{nm}\right) = \sum_{m=1}^{M}\sum_{n=1}^{N}a_{nm}b_{mn} = \sum_{n=1}^{N}\sum_{m=1}^{M}a_{nm}b_{mn} \]

\[= \sum_{n=1}^{N}\left(\sum_{m=1}^{M}a_{nm}b_{mn}\right) = \sum_{n=1}^{N}1 = N.\ \Box\]

Theorem. [Corollary 1 on page 44 of the text]. Let \(\mathcal{V}\) be a finite-dimensional vector space over a field \(\mathbb{F}\). If \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) and \(\{\mathbf{w}_{m}\}_{m\in\mathcal{M}}\) are both bases for \(\mathcal{V}\), then \(\#(\mathcal{M})= \#(\mathcal{N})\).

Proof. Write out the sets \(\mathcal{N} = \{n_{1},n_{2},\ldots, n_{N}\}\) and \(\mathcal{M} = \{m_{1},\ldots,m_{M}\}\) where \(N=\#(\mathcal{N})\) and \(M = \#(\mathcal{M})\), then apply the previous proof to the sequences \(\{\mathbf{v}_{n_{k}}\}_{k=1}^{N}\) and \(\{\mathbf{w}_{m_{k}}\}_{k=1}^{M}\). \(\Box\)

Corollary.  Let \(\mathcal{V}\) be a finite-dimensional vector space over a field \(\mathbb{F}\). If \(\{\mathbf{w}_{m}\}_{m\in\mathcal{M}}\) is linearly independent in \(\mathcal{V}\) and \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is spans \(\mathcal{V}\), then \(\#(\mathcal{M})\leq\#(\mathcal{N})\).

Theorem. [Corollary 2 on page 46 of the text]. Let \(\mathcal{V}\) be a finite-dimensional vector space over a field \(\mathbb{F}\), and let\(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) be a finite sequence in \(\mathcal{V}\). If \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is independent, then there exists a sequence \(\{\mathbf{v}_{m}\}_{m\in\mathcal{M}_{0}}\) with \(\mathcal{N}\cap\mathcal{M}_{0}=\varnothing\) such that \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}\cup\mathcal{M}_{0}}\) is a basis for \(\mathcal{V}\). In particular, any independent set contains at most \(\operatorname{dim}\mathcal{V}\) vectors.

 

 

 

Proof. Since \(\mathcal{V}\) is finite-dimensional, there is a finite sequence \(\{\mathbf{v}_{m}\}_{m\in\mathcal{M}}\) which is a basis for \(\mathcal{V}\), in particular, \(\{\mathbf{v}_{m}\}_{m\in\mathcal{M}}\) spans \(\mathcal{V}\). Clearly, we may assume \(\mathcal{N}\cap\mathcal{M} = \varnothing.\) By Lemma 1, we see that there is a set \(\mathcal{M}_{0}\subset\mathcal{M}\) such that \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}\cup\mathcal{M}_{0}}\) is a basis. \(\Box\)

Every independent set can be completed to a basis.

This shows that all bases for finite-dimensional vector spaces have the same number of vectors. Hence, the following concept is well-defined:

Definition. If \(\mathcal{V}\) is a finite-dimensional vector space, then the dimension of \(\mathcal{V}\), denoted \(\operatorname{dim}\mathcal{V}\) is the number of vectors in any basis for \(\mathcal{V}.\)

Definition. Suppose \(\mathcal{N}\) is a finite nonempty set and \(\mathbb{F}\) is a field. For each \(n\in\mathcal{N}\) define the vector \(\boldsymbol{\delta}_{n}\in\mathbb{F}^{\mathcal{N}}\) by

\[\boldsymbol{\delta}_{n}(m) = \begin{cases} 1 & m=n,\\ 0 & m\neq n.\end{cases}\]

The sequence \(\{\boldsymbol{\delta}_{n}\}_{n\in\mathcal{N}}\) is called the standard basis for \(\mathbb{F}^{\mathcal{N}}\).

Note: The textbook uses the notation \(\epsilon_{n}\) instead of \(\boldsymbol{\delta}_{n}\). We will use \(\boldsymbol{\delta}_{n}\).

In \(\mathbb{R}^{N}\) (or, more generally, \(\mathbb{F}^{N}\)), we have

\[\boldsymbol{\delta}_{1} = \begin{bmatrix}1\\0\\0\\0\\\vdots\\0\end{bmatrix},\ \boldsymbol{\delta}_{2} = \begin{bmatrix}0\\1\\0\\0\\\vdots\\0\end{bmatrix},\ \boldsymbol{\delta}_{3} = \begin{bmatrix}0\\0\\1\\0\\\vdots\\0\end{bmatrix}, \ldots,\ \boldsymbol{\delta}_{N} = \begin{bmatrix}0\\0\\0\\0\\\vdots\\1\end{bmatrix}\]

Proposition. If \(\mathcal{N}\) is a finite nonempty set, and \(\mathbb{F}\) is a field, then the standard basis \(\{\boldsymbol{\delta}_{n}\}_{n\in\mathcal{N}}\) is a basis for \(\mathbb{F}^{\mathcal{N}}\), and hence \[\operatorname{dim}\mathbb{F}^{\mathcal{N}} = \#(\mathcal{N}).\] In particular, for \(N\in\mathbb{N}\), \(\operatorname{dim}\mathbb{F}^{N}=N\).

Proof. Note that for \(\mathbf{x}\in\mathbb{F}^{\mathcal{N}}\) we have \[\mathbf{x} = \sum_{n\in\mathcal{N}}\mathbf{x}(n)\boldsymbol{\delta}_{n}.\]

This shows that \(\{\boldsymbol{\delta}_{n}\}_{n\in\mathcal{N}}\)  spans \(\mathbb{F}^{\mathcal{N}}\).

Suppose there is some \(\mathbf{x}\in\mathbb{F}^{\mathcal{N}}\) such that

\[\boldsymbol{0} = \sum_{n\in\mathcal{N}}\mathbf{x}(n)\boldsymbol{\delta}_{n},\]

that is, the function \(\mathbf{z}:=\sum_{n\in\mathcal{N}}\mathbf{x}(n)\boldsymbol{\delta}_{n}\) is the zero function. Then, for each \(m\in\mathcal{N}\) we have

\[0=\mathbf{z}(m) =\sum_{n\in\mathcal{N}}\mathbf{x}(n)\boldsymbol{\delta}_{n}(m) = \mathbf{x}(m).\]

\[\Box\]

Theorem. [Corollary 1 on page 46 of the text, also see Theorem 5.9(c) in the notes]. Let \(\mathcal{V}\) be a finite-dimensional vector space over a field \(\mathbb{F}\). If \(\mathcal{W}\) is a proper subspace of \(\mathcal{V}\), then \(\mathcal{W}\) is finite dimensional, and \(\operatorname{dim}\mathcal{W}<\operatorname{dim}\mathcal{V}\).

 

Proof. (On the board and in the text).

Dimension of a subspace

A subspace \(\mathcal{W}\) of a vector space \(\mathcal{V}\) is a subspace in its own right, so it makes sense to talk about a basis for \(\mathcal{W}\) and the dimension of \(\mathcal{W}\).

A vector space \(\mathcal{V}\) always has at least two subspaces, namely, \(\mathcal{V}\) itself and the zero subspace \(\{\mathbf{0}\}\).  (Why isn't \(\varnothing\) a subspace?) A subspace \(\mathcal{W}\subset\mathcal{V}\) which is not equal to \(\mathcal{V}\) is called a proper subspace.

Corollary. Let \(\mathcal{V}\) be a finite-dimensional vector space over a field \(\mathbb{F}\), and let\(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) be a finite sequence in \(\mathcal{V}\).

  1. If \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) spans \(\mathcal{V}\), then \(\#(\mathcal{N})\geq\operatorname{dim}\mathcal{V}\).
  2. If \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is linearly independent, then \(\#(\mathcal{N})\leq\operatorname{dim}\mathcal{V}\).

If equality holds in either case, then \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is a basis.

 

Proof. (Exercise)

Part 4

Linear operators

(See Section 2 in the notes)

Definition 2.1. Let \(\mathcal{U}\) and \(\mathcal{V}\) be vector spaces over a common field \(\mathbb{F}\). A function \(\mathbf{L}:\mathcal{V}\to\mathcal{U}\) is linear if for every \(c_{1},c_{2}\in\mathbb{F}\) and \(\mathbf{v}_{1},\mathbf{v}_{2}\in\mathcal{V}\) we have

\[\mathbf{L}(c_{1}\mathbf{v}_{1} + c_{2}\mathbf{v}_{2}) = c_{1}\mathbf{L}(\mathbf{v}_{1}) + c_{2}\mathbf{L}(\mathbf{v}_{2}).\]

Example. Let \(\mathbf{V}:\mathbb{R}^{3}\to\mathbb{R}^{2}\) be given by

\[\mathbf{V}\left(\left[\begin{matrix} a\\ b\\c\end{matrix}\right]\right) =a\begin{bmatrix} 1\\ 0\end{bmatrix} + b\begin{bmatrix} 0\\ 1\end{bmatrix} + c\begin{bmatrix} 0\\ 2\end{bmatrix}\]

That is, \(\mathbf{V}\) is the synthesis operator of the sequence \(\begin{bmatrix} 1\\ 0\end{bmatrix},\begin{bmatrix} 0\\ 1\end{bmatrix},\begin{bmatrix} 0\\ 2\end{bmatrix}\).

\(\mathbf{V}\left(c_{1}\begin{bmatrix} a\\b\\c\end{bmatrix} + c_{2}\begin{bmatrix} x\\y\\z\end{bmatrix}\right) = \mathbf{V}\left(\begin{bmatrix} c_{1}a+c_{2}x\\c_{1}b+c_{2}y\\c_{1}c+c_{2}z\end{bmatrix}\right)\)

\( =(c_{1}a+c_{2}x)\begin{bmatrix}1\\0\end{bmatrix}+(c_{1}b+c_{2}y)\begin{bmatrix}0\\1\end{bmatrix}+(c_{1}c+c_{2}z)\begin{bmatrix}0\\2\end{bmatrix}\)

For any \(a,b,c,x,y,z,c_{1},c_{2}\in\mathbb{R}\) we have

\( =c_{1}a\begin{bmatrix}1\\0\end{bmatrix}+c_{1}b\begin{bmatrix}0\\1\end{bmatrix}+c_{1}c\begin{bmatrix}0\\2\end{bmatrix} + c_{2}x\begin{bmatrix}1\\0\end{bmatrix}+c_{2}y\begin{bmatrix}0\\1\end{bmatrix}+c_{2}z\begin{bmatrix}0\\2\end{bmatrix}\)

\( =c_{1}\left(a\begin{bmatrix}1\\0\end{bmatrix}+b\begin{bmatrix}0\\1\end{bmatrix}+c\begin{bmatrix}0\\2\end{bmatrix}\right) + c_{2}\left(x\begin{bmatrix}1\\0\end{bmatrix}+y\begin{bmatrix}0\\1\end{bmatrix}+z\begin{bmatrix}0\\2\end{bmatrix}\right)\)

\( =c_{1}\mathbf{V}\left(\begin{bmatrix} a\\b\\c\end{bmatrix}\right) + c_{2}\mathbf{V}\left(\begin{bmatrix} x\\y\\z\end{bmatrix}\right).\)

This shows that \(\mathbf{V}\) is linear.

Proposition. Suppose \(\mathcal{V}\) is a vector space over a field \(\mathbb{F}\) and \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is a finite sequence in \(\mathcal{V}\). Then, the synthesis operator \(\mathbf{V}:\mathbb{F}^{\mathcal{N}}\to\mathcal{V}\) given by \[\mathbf{V}(\mathbf{x}) = \sum_{n\in\mathcal{N}}\mathbf{x}(n)\mathbf{v}_{n}\]

is a linear. 

 

Proof. (Exercise)

In general, synthesis operators are linear. You should attempt to write out a proof.

Theorem 2.2. Let \(\mathcal{V}\) and \(\mathcal{U}\) be vector spaces over the same field \(\mathbb{F}\), and let \(\mathbf{L}:\mathcal{V}\to\mathcal{U}\) be a function. Then:

(i) \(\mathbf{L}(\mathbf{v}_{1}+\mathbf{v}_{2}) = \mathbf{L}(\mathbf{v}_{1})+\mathbf{L}(\mathbf{v}_{2})\) for all \(\mathbf{v}_{1},\mathbf{v}_{2}\in\mathcal{V}\)

(ii) \(\mathbf{L}(c\mathbf{v}) = c\mathbf{L}(\mathbf{v})\) for all \(\mathbf{v}\in\mathcal{V}\) and \(c\in\mathbb{F}\).

In particular, if \(\mathbf{L}\) is linear, then it distributes over linear combinations, that is,  for any finite sequences \(\{c_{n}\}_{n\in\mathcal{N}}\) and \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) of scalars in \(\mathbb{F}\) and vectors in \(\mathcal{V}\), respectively, we have

\[\mathbf{L}\left(\sum_{n\in\mathcal{N}}c_{n}\mathbf{v}_{n}\right) = \sum_{n\in\mathcal{N}}c_{n}\mathbf{L}(\mathbf{v}_{n}).\]

(a) \(\mathbf{L}\) is linear if and only if both

(b) Suppose \(\mathbf{L}\) is linear, and \(\mathbf{u}\in\mathcal{U}\). If there exists \(\mathbf{v}_{1}\in\mathcal{V}\) such that \(\mathbf{L}(\mathbf{v}_{1}) = \mathbf{u}\), then \[\{\mathbf{v}\in\mathcal{V} : \mathbf{L}(\mathbf{v})=\mathbf{u}\} = \{\mathbf{v}_{1}+\mathbf{v}_{2} : \mathbf{L}(\mathbf{v}_{2}) = \mathbf{0}\}.\]

Proof. (On the board and in the notes)

Theorem 2.4. Let \(\mathcal{V}\) and \(\mathcal{U}\) be vector spaces over the same field \(\mathbb{F}\), and let \(\mathbf{L}:\mathcal{V}\to\mathcal{U}\) be a linear operator.

 

(a) For any \(\mathbf{u}\in\mathcal{U}\), the equation \(\mathbf{L}(\mathbf{v})=\mathbf{u}\) has a solution if and only if \(\mathbf{u}\in\mathbf{L}(\mathcal{V})\). Moreover, when this occurs, taking any \(\mathbf{v}_{0}\) such that

\(\mathbf{L}(\mathbf{v}_{0})=\mathbf{u}\), the set of all solutions is

\[\mathbf{v}_{0}+\operatorname{ker}(\mathbf{L}) = \{\mathbf{v}_{0}+\mathbf{v} : \mathbf{v}\in\operatorname{ker}(\mathbf{L})\}.\]

(b) \(\mathbf{L}(\mathbf{0}) = \mathbf{0}.\)

(c) \(\operatorname{im}\mathbf{L}\) is a subspace of \(\mathcal{U}\).

(d) \(\operatorname{ker}\mathbf{L}\) is a subpsace of \(\mathcal{V}.\)

Definition.  Let \(\mathbf{L}:\mathcal{V}\to\mathcal{U}\) be a linear operator. The kernel of \(\mathbf{L}\) is the set

\[\operatorname{ker}\mathbf{L}:=\{\mathbf{v}\in\mathcal{V} : \mathbf{L}(\mathbf{v}) = \mathbf{0}\}\]

This is also called the nullspace of \(\mathbf{L}\), and is sometimes denoted \(\operatorname{null}\mathbf{L}\).

The image of \(\mathbf{L}\) is the set

\[\mathbf{L}(\mathcal{V}) = \operatorname{im}\mathbf{L}:=\{\mathbf{L}(\mathbf{v}) : \mathbf{v}\in\mathcal{V}\}.\]

This is also called the range of \(\mathbf{L}\) and is sometimes denoted \(\mathbf{L}(\mathcal{V})\). 

Example. Let \(\mathcal{V}\) be a vector space. Two important linear operators:

  1. The identity operator is the linear map \(\mathbf{I}:\mathcal{V}\to\mathcal{V}\) where \(\mathbf{I}(\mathbf{v}) = \mathbf{v}\) for all \(\mathbf{v}\in\mathcal{V}\).
  2. The zero operator is the linear map \(\mathbf{0}:\mathcal{V}\to\mathcal{V}\) where \(\mathbf{0}(\mathbf{v}) = \mathbf{0}\) for all \(\mathbf{v}\in\mathcal{V}\).

Part 5

Matrices

(See Section 3 in the notes)

Definition 3.2.  Let \(\mathbb{F}\) be a field, and let \(\mathcal{M}\) and \(\mathcal{N}\) be finite nonempty sets.

(a) An \(\mathcal{M}\times\mathcal{N}\) matrix over \(\mathbb{F}\) is a function from \(\mathcal{M}\times\mathcal{N}\) into \(\mathbb{F}\), namely a member of the set

\[\mathbb{F}^{\mathcal{M}\times\mathcal{N}} = \{\mathbf{A}:\mathcal{M}\times\mathcal{N}\to\mathbb{F}\}.\]

In the special case where \(\mathcal{M}=[M]=\{1,2,\ldots,M\}\) and \(\mathcal{N}=[N]=\{1,2,\ldots,N\}\), we denote \(\mathbb{F}^{\mathcal{M}\times\mathcal{N}} = \mathbb{F}^{[M]\times[N]}\) by \(\mathbb{F}^{M\times N}\) and we refer to its elements as \(M\times N\) matrices.

(b) Given \(\mathbf{A}\in\mathbb{F}^{\mathcal{M}\times\mathcal{N}}\) and \(\mathbf{x}\in\mathbb{F}^{\mathcal{N}}\) we define their product vector \(\mathbf{A}\mathbf{x}\in\mathbb{F}^{\mathcal{M}}\) by

\[(\mathbf{Ax})(m): = \sum_{n\in\mathcal{N}}\mathbf{A}(m,n)\mathbf{x}(n).\] 

Example. Let \(\mathbf{A}\in\mathbb{R}^{2\times 3}\) be given by \(\mathbf{A}(m,n)=m+n\). For instance, \(\mathbf{A}(2,2)=4\). We can display this matrix as follows:

\[\mathbf{A}=\begin{bmatrix} 2 & 3 & 4\\ 3 & 4 & 5\end{bmatrix}.\]

If \[\mathbf{x} = \begin{bmatrix} 1\\ 0\\ -2\end{bmatrix}\in\mathbb{R}^{3},\] then the matrix-vector product \(\mathbf{Ax}\) is the vector

\[\mathbf{Ax} = \begin{bmatrix} 2 & 3 & 4\\ 3 & 4 & 5\end{bmatrix}\begin{bmatrix} 1\\ 0\\ -2\end{bmatrix} =\begin{bmatrix}-6\\-7\end{bmatrix} .\]

We usually denote an \(\mathcal{M}\times\mathcal{N}\) matrix \(\mathbf{A}\) as a table with rows indexed by \(\mathcal{M}\), columns indexed by \(\mathcal{N}\), and \(\mathbf{A}(m,n)\) in row \(m,\) columns \(n.\)

Example. Let \(G=\mathbb{Z}_{2}\times\mathbb{Z}_{2}\), that is \(G=\{(0,0),(0,1),(1,0),(1,1)\}\) with entrywise addition mod 2.

 \((\mathbf{Ax})(0,0)=-3,\ (\mathbf{Ax})(0,1)=3,\ (\mathbf{Ax})(1,0)=5,\ (\mathbf{Ax})(1,1)=2,\)

If \(\mathbf{A}\) is a \(G\times G\) matrix over \(\mathbb{R}\), that is \(\mathbf{A}\in\mathbb{R}^{G\times G}\), then \(\mathbf{A}\) can be described by a table, for example

\[\begin{array}{c|cccc} &(0,0)&(0,1)&(1,0)&(1,1)\\\hline(0,0)&2&3&0&-1\\(0,1)&0&-2&0&\frac{1}{2}\\(1,0)&-2&1&2&4\\(1,1)&0&0&-1&1\end{array}\]

In particular, the entries in the matrix \(\mathbf{A}\) are the numbers \(\mathbf{A}(g,h)\) where \(g,h\in G\). So, we can refer to the \(((0,0),(1,0))\) entry, which is \(3\), but there is no \((1,2)\) entry.

If we take \(\mathbf{x}\in\mathbb{R}^{G}\) where \(\mathbf{x}(0,0)=1,\ \mathbf{x}(0,1)=-1,\ \mathbf{x}(1,0)=0,\) and \(\mathbf{x}(1,1)=2\), then we can compute the matrix-vector product \(\mathbf{A}\mathbf{x}\), and we see that

Theorem 3.3.  Let \(\mathbb{F}\) be a field, and let \(\mathcal{M}\) and \(\mathcal{N}\) be finite nonempty sets.

(a) \(\mathbf{x} = \sum_{n\in\mathcal{N}}\mathbf{x}(n)\boldsymbol{\delta}_{n}\) for any \(\mathbf{x}\in\mathbb{F}^{\mathcal{N}}\).

(b) If \(\mathbf{L}:\mathbb{F}^{\mathcal{N}}\to\mathbb{F}^{\mathcal{M}}\) is linear, then there exists \(\mathbf{A}\in\mathbb{F}^{\mathcal{M}\times\mathcal{N}}\) such that \(\mathbf{L}(\mathbf{x}) = \mathbf{Ax}\). Specifically, \(\mathbf{A}(m,n)=[\mathbf{L}(\boldsymbol{\delta}_{n})](m)\) for each \(m\in\mathcal{M}\) and \(n\in\mathcal{N}.\)

(c) Conversely, if \(\mathbf{A}\in\mathbb{F}^{\mathcal{M}\times\mathcal{N}}\), then the function \(\mathbf{L}:\mathbb{F}^{\mathcal{N}}\to\mathbb{F}^{\mathcal{M}}\) given by \(\mathbf{L}(\mathbf{x}) = \mathbf{Ax}\) is linear. In this case we necessarily have \([\mathbf{L}(\boldsymbol{\delta}_{n})](m) = \mathbf{A}(m,n)\) for all \(m\in\mathcal{M}\) and \(n\in\mathcal{N}.\)

Proof. (On the board and in the notes)

Note: Given a linear operator \(\mathbf{L}:\mathbb{F}^{\mathcal{N}}\to\mathbb{F}^{\mathcal{M}}\), we say that the matrix \(\mathbf{A}\) given by Theorem 3.3 (b) is the matrix representation of \(\mathbf{L}\).

We will often use the same symbol to denote the function and its matrix representation.

Example.  Consider the identity operator \(\mathbf{I}:\mathbb{R}^{\mathcal{N}}\to\mathbb{R}^{\mathcal{N}}\). The matrix representation given by Theorem 3.3 (b) is called the identity matrix. For example if \(\mathcal{N} = [3]\), then

\[\mathbf{I} = \begin{bmatrix}\boldsymbol{\delta}_{1} & \boldsymbol{\delta}_{2} & \boldsymbol{\delta}_{3}\end{bmatrix} = \begin{bmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1\end{bmatrix}\]

Example. Let \(\mathbf{V}:\mathbb{R}^{3}\to\mathbb{R}^{2}\) be given by

\[\mathbf{V}\left(\left[\begin{matrix} a\\ b\\c\end{matrix}\right]\right) =a\begin{bmatrix} 1\\ 0\end{bmatrix} + b\begin{bmatrix} 0\\ 1\end{bmatrix} + c\begin{bmatrix} 0\\ 2\end{bmatrix}\]

That is, \(\mathbf{V}\) is the synthesis operator of the sequence \(\begin{bmatrix} 1\\ 0\end{bmatrix},\begin{bmatrix} 0\\ 1\end{bmatrix},\begin{bmatrix} 0\\ 2\end{bmatrix}\).

By Theorem 3.3 (b) there is a matrix \(\mathbf{A}\in\mathbb{R}^{2\times 3}\) such that \(\mathbf{V}(\mathbf{x}) = \mathbf{Ax}\) for all \(\mathbf{x}\in\mathbb{R}^{3}\). Using the formula for \(\mathbf{A}(m,n)\) given in Theorem 3.3 (b) we see that 

\[\mathbf{A} = \begin{bmatrix} 1 & 0 & 0\\ 0 & 1 & 2\end{bmatrix}\]

Example. Let \(\mathbf{L}:\mathbb{R}^{3}\to\mathbb{R}^{3}\) be given by

\[\mathbf{L}\left(\left[\begin{matrix} a\\ b\\c\end{matrix}\right]\right) =\begin{bmatrix} c\\ a\\ b\end{bmatrix} \]

To find the matrix representation we compute

\[\mathbf{L}(\boldsymbol{\delta}_{1}) = \begin{bmatrix} 0\\1\\0\end{bmatrix},\ \mathbf{L}(\boldsymbol{\delta}_{2}) = \begin{bmatrix} 0\\0\\1\end{bmatrix},\ \mathbf{L}(\boldsymbol{\delta}_{3}) = \begin{bmatrix} 1\\0\\0\end{bmatrix}.\]

From this we deduce that the matrix representation is

\[\begin{bmatrix} 0&0&1\\1&0&0\\ 0&1&0\\\end{bmatrix},\]

that is, for any \(\mathbf{x}\in\mathbb{R}^{3}\) we have

\[\mathbf{L}(\mathbf{x}) = \begin{bmatrix} 0&0&1\\1&0&0\\ 0&1&0\\\end{bmatrix}\begin{bmatrix}\mathbf{x}(1)\\\mathbf{x}(2)\\\mathbf{x}(3)\end{bmatrix}\]

Example. Define the matrix

\[\mathbf{B}:=\begin{bmatrix} 1 & 0\\ 0 & 1\\ 0 & 2\end{bmatrix}\in\mathbb{R}^{3\times 2}.\]

Consider the function \(\mathbf{F}:\mathbb{R}^{2}\to\mathbb{R}^{3}\) given by \(\mathbf{F}(\mathbf{x}) = \mathbf{Bx}\) for all \(\mathbf{x}\in\mathbb{R}^{2}\).

 

By Theorem 3.3 (c) the function \(\mathbf{F}\) is automatically linear! No need to check that \(\mathbf{F}(c_{1}\mathbf{x}_{1} + c_{2}\mathbf{x}_{2}) = c_{1}\mathbf{F}(\mathbf{x}_{1}) + c_{2}\mathbf{F}(\mathbf{x}_{2})\) for all \(c_{1},c_{2}\in\mathbb{R}\) and \(\mathbf{x}_{1},\mathbf{x}_{2}\in\mathbb{R}^{2}.\)

Example.  Consider the system of real-variable equations with three unknowns:

\[\begin{array}{rrrrrrr}x_{1} & + & x_{2} & - & x_{3} & = & 3,\\ x_{1} & - & 4x_{2} &  &  & = & -2,\\ 2x_{1} & - & 3x_{2} & - & x_{3} & = & 1,\end{array}\]

If we set

 

 

 

then this system of equations is equivalent to the matrix-vector equation \(\mathbf{Ax}=\mathbf{b}\). 

\[\mathbf{A} = \begin{bmatrix} 1 & 1 & -1\\ 1 & -4 & 0\\ 2 & -3 & -1\end{bmatrix},\]

\[\mathbf{x} = \begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\end{bmatrix},\]

and

Since the map \(\mathbf{x}\mapsto\mathbf{Ax}\) (we will denote this map as \(\mathbf{A}\)) is linear, we see from Theorem 2.4 (a) that the solution set is \[\mathbf{x}_{0}+\operatorname{ker}\mathbf{A} = \{\mathbf{x}_{0} +\mathbf{x} : \mathbf{x}\in\operatorname{\ker}\mathbf{A}\},\]

where \(\mathbf{x}_{0}\) is some vector such that \(\mathbf{Ax}_{0} = \mathbf{b}\).

\[\mathbf{b} = \begin{bmatrix} 3\\-2\\1\end{bmatrix},\]

Example.  We wish to solve \(\mathbf{Ax}=\mathbf{b}\), where

 

 

 

 

\[\mathbf{A} = \begin{bmatrix} 1 & 1 & -1\\ 1 & -4 & 0\\ 2 & -3 & -1\end{bmatrix},\]

\[\mathbf{x} = \begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\end{bmatrix},\]

\[\mathbf{b} = \begin{bmatrix} 3\\-2\\1\end{bmatrix},\]

and

In particular, note that \(\mathbf{x}_{0} = \begin{bmatrix} 2\\ 1\\ 0\end{bmatrix}\) is a solution, and \(\operatorname{ker}\mathbf{A} = \operatorname{span}\left\{\begin{bmatrix}4\\1\\5\end{bmatrix}\right\}\)

So, the solution set is

\[\left\{\begin{bmatrix} 2\\1\\0\end{bmatrix} + a\begin{bmatrix}4\\1\\5\end{bmatrix} : a\in\mathbb{R}\right\}\]

(See Example 3.8 in the notes for a review of how we find a particular solution and a basis for the kernel using Gaussian elimination, aka row reduction.)

Part 6

Operator Algebra

(See Section 4 in the notes)

Theorem 4.2. If \(\mathbf{K}:\mathcal{W}\rightarrow\mathcal{V}\) and \(\mathbf{L}:\mathcal{V}\rightarrow\mathcal{U}\) are linear then their composition \(\mathbf{L}\mathbf{K}\) is as well.

   Moreover, in the special case where \(\mathcal{W}=\mathbb{F}^{\mathcal{P}}\), \(\mathcal{V}=\mathbb{F}^{\mathcal{N}}\) and \(\mathcal{U}=\mathbb{F}^{\mathcal{M}}\) for some field \(\mathbb{F}\) and finite nonempty sets \(\mathcal{M}\), \(\mathcal{N}\) and \(\mathcal{P}\),
taking \(\mathbf{A}\in\mathbb{F}^{\mathcal{M}\times\mathcal{N}}\) such that \(\mathbf{L}(\mathbf{x})=\mathbf{A}\mathbf{x}\) for all \(\mathbf{x}\in\mathbb{F}^{\mathcal{N}}\) and \(\mathbf{B}\in\mathbb{F}^{\mathcal{N}\times\mathcal{P}}\) such that \(\mathbf{K}(\mathbf{y})=\mathbf{B}\mathbf{y}\) for all \(\mathbf{y}\in\mathbb{F}^{\mathcal{P}}\) we have \((\mathbf{L}\mathbf{K})(\mathbf{y})=(\mathbf{A}\mathbf{B})\mathbf{y}\) for all \(\mathbf{y}\in\mathbb{F}^\mathcal{P}\) where \(\mathbf{A}\mathbf{B}\) is the product of \(\mathbf{A}\) and \(\mathbf{B}\), defined as

 

\[\mathbf{A}\mathbf{B}\in\mathbb{F}^{\mathcal{M}\times\mathcal{P}},\quad (\mathbf{A}\mathbf{B})(m,p):=\sum_{n\in\mathcal{N}}\mathbf{A}(m,n)\mathbf{B}(n,p),\quad\forall\, m\in\mathcal{M},\, p\in\mathcal{P}.\]

Proof. (On the board and in the notes)

Definition. Let \(\mathcal{U},\) and \(\mathcal{V}\) be vector spaces over the field \(\mathbb{F}\). Given linear operators \(\mathbf{L},\mathbf{M}:\mathcal{U}\to\mathcal{V}\) and a scalar \(c\in\mathbb{F}\) we define the functions

\(\mathbf{L}+\mathbf{M}:\mathcal{U}\to\mathcal{V}\) and \(c\mathbf{L}:\mathcal{U}\to\mathcal{V}\) by

\[(\mathbf{L}+\mathbf{M})(\mathbf{u}) = \mathbf{L}(\mathbf{u})+\mathbf{M}(\mathbf{u})\quad\text{for all } \mathbf{u}\in\mathcal{U}\]

\[(c\mathbf{L})(\mathbf{u}) =  c\big(\mathbf{L}(\mathbf{u})\big)\quad\text{for all } \mathbf{u}\in\mathcal{U}.\]

Let \(\operatorname{Hom}(\mathcal{U},\mathcal{V})\) denote the set of linear operators with domain \(\mathcal{U}\) and codomain \(\mathcal{V}\). Let \(\operatorname{Hom}(\mathcal{V}) = \operatorname{Hom}(\mathcal{V},\mathcal{V})\).

In Theorem 4.2 we showed that the composition of two linear operators is linear. Moreover, the matrix representation of the composition is the product of the matrix representations. Thus, we sometimes think of composition of linear operators as a "product".

 

We can also define a sum and scalar multiplication.

Theorem 4.5 (a). Let \(\mathcal{U},\) and \(\mathcal{V}\) be vector spaces over the field \(\mathbb{F}\). With the definition of addition and scalar multiplication given, the set \(\operatorname{Hom}(\mathcal{U,V})\) is a vector space over \(\mathbb{F}\).

  Suppose \(\mathcal{M}\) and \(\mathcal{N}\) are finite sets. If \(\{\mathbf{L}_{j}\}_{j\in\mathcal{J}}\) is a finite sequence in \(\operatorname{Hom}(\mathcal{\mathbb{F}^{\mathcal{N}},\mathbb{F}^{\mathcal{M}}})\), \(\{c_{j}\}_{j\in\mathcal{J}}\) is a finite sequence in \(\mathbb{F}\), and \(\mathbf{A}_{j}\in\mathbb{F}^{\mathcal{M}\times\mathcal{N}}\) is the matrix such that \(\mathbf{L}_{j}(\mathbf{x}) = \mathbf{A}_{j}\mathbf{x}\) for each \(j\in\mathcal{J}\) and \(\mathbf{x}\in\mathbb{F}^{\mathcal{N}}\), then

\[\mathbf{A}:=\sum_{j\in\mathcal{J}}c_{j}\mathbf{A}_{j}\] is the matrix representation of the linear map \[\mathbf{L}:=\sum_{j\in\mathcal{J}}c_{j}\mathbf{L}_{j},\]

that is, \(\mathbf{L}(\mathbf{x}) = \mathbf{A}\mathbf{x}\) for all \(\mathbf{x}\in\mathbb{F}^{\mathcal{N}}\).

(You should read the other parts of Theorem 4.5 and their proofs.)

(b) Composition is associative

(c) The identity operator is the identity for composition

(d) Composition distributes over linear combinations.

Example. Let \(G=\mathbb{Z}_{2}\times\mathbb{Z}_{2}\), that is \(G=\{(0,0),(0,1),(1,0),(1,1)\}\) with entrywise addition mod 2.

Define the function \(\mathbf{T}:\mathbb{R}^{G}\to\mathbb{R}^{G}\) by

\[\big(\mathbf{T}(\mathbf{x})\big)(g) = \mathbf{x}(g+(0,1)).\]

For example, if \(\mathbf{x}\in\mathbb{R}^{G}\) is given by \(\mathbf{x}(0,0)=0\), \(\mathbf{x}(0,1)=1\), \(\mathbf{x}(1,0)=-2,\) and \(\mathbf{x}(1,1)=4\), then

\[\big(\mathbf{T}(\mathbf{x})\big)(0,0) = \mathbf{x}\big((0,0)+(0,1)\big) = \mathbf{x}(0,1) = 1\]

\[\big(\mathbf{T}(\mathbf{x})\big)(0,1) = \mathbf{x}\big((0,1)+(0,1)\big) = \mathbf{x}(0,0) = 0\]

\[\big(\mathbf{T}(\mathbf{x})\big)(1,0) = \mathbf{x}\big((1,0)+(0,1)\big) = \mathbf{x}(1,1) = 4\]

\[\big(\mathbf{T}(\mathbf{x})\big)(1,1) = \mathbf{x}\big((1,1)+(0,1)\big) = \mathbf{x}(1,0) = -2.\]

It is fairly straightforward to show that \(\mathbf{T}\) is linear, that is, \(\mathbf{T}\in\operatorname{Hom}(\mathbb{R}^{G})\)

If we choose an order on \(G\), namely \((0,0),(0,1),(1,0),(1,1)\) then we can represent \(\mathbf{x}\in\mathbb{R}^{G}\) as a column vector:

Example continued. If we choose an order on \(G\), namely \((0,0),(0,1),(1,0),(1,1)\) then we can represent \(\mathbf{x}\in\mathbb{R}^{G}\) as a column vector: \[\mathbf{x} = \begin{bmatrix} \mathbf{x}(0,0)\\ \mathbf{x}(0,1)\\ \mathbf{x}(1,0)\\ \mathbf{x}(1,1)\end{bmatrix}\]

Then we can see that \[\mathbf{T}\mathbf{x} = \begin{bmatrix} \mathbf{x}(0,1)\\ \mathbf{x}(0,0)\\ \mathbf{x}(1,1)\\ \mathbf{x}(1,0)\end{bmatrix}\]

With the same ordering of the rows and columns of the matrix representation of \(\mathbf{T}\), then we have

\[\mathbf{T} = \begin{bmatrix} 0&1&0&0\\1&0&0&0\\0&0&0&1\\0&0&1&0\end{bmatrix}\]

Example continued. By Theorem 4.2, the composition of \(\mathbf{T}\) with itself, which we will denote by \(\mathbf{TT}\) or \(\mathbf{T}^2\), is also in \(\operatorname{Hom}(\mathbb{R}^{G})\), and it's matrix representation is the matrix product of the matrix representation of \(\mathbf{T}\) with itself, that is,

\[\mathbf{T}^2 = \begin{bmatrix} 0&1&0&0\\1&0&0&0\\0&0&0&1\\0&0&1&0\end{bmatrix}^{2} = \begin{bmatrix} 1&0&0&0\\0&1&0&0\\0&0&1&0\\0&0&0&1\end{bmatrix} = \mathbf{I}\]

Definition 4.6. A linear function \(\mathbf{L}:\mathcal{V}\rightarrow\mathcal{U}\) is invertible if there exists a function \(\mathbf{K}:\mathcal{U}\rightarrow\mathcal{V}\) such that
\[\mathbf{L}(\mathbf{K}(\mathbf{u}))=\mathbf{u},\quad\forall\,\mathbf{u}\in\mathcal{U},\qquad\mathbf{K}(\mathbf{L}(\mathbf{v})) =\mathbf{v},\quad\forall\,\mathbf{v}\in\mathcal{V}.\]
Below, we show that any such \(\mathbf{K}\) is necessarily linear, and so these conditions may be equivalently restated as having \(\mathbf{L}\mathbf{K}=\mathbf{I}\) and \(\mathbf{K}\mathbf{L}=\mathbf{I}\).
We also show that any such \(\mathbf{K}\) is unique, and will often denote it as \(\mathbf{L}^{-1}\).

Theorem 4.7.  Let \(\mathbf{L}:\mathcal{V}\to\mathcal{U}\) be linear.

(a) If \(\mathbf{L}\) is invertible, then \(\mathbf{L}^{-1}\) is unique, linear, and is itself invertible with \((\mathbf{L}^{-1})^{-1}=\mathbf{L}\).

(b) The map \(\mathbf{L}\) is invertible if and only if \(\mathbf{L}(\mathcal{V}) = \mathcal{U}\) and \(\operatorname{ker}(\mathbf{L}) = \{\mathbf{0}\}\).

(c) (1) The map \(\mathbf{L}\) is a surjection if and only if \(\mathbf{L}(\mathcal{V}) = \mathcal{U}\).

      (2) The map \(\mathbf{L}\) is an injection if and only if \(\operatorname{ker}(\mathbf{L})=\{\mathbf{0}\}\).

(d) If \(\mathbf{L}\) is invertible and so is \(\mathbf{K}:\mathcal{W}\to\mathcal{V}\), then \(\mathbf{LK}\) is invertible with \((\mathbf{LK})^{-1} = \mathbf{K}^{-1}\mathbf{L}^{-1}.\)

Definition. Given two vector spaces \(\mathcal{U}\) and \(\mathcal{V}\) over a common field \(\mathbb{F}\), we say that \(\mathcal{U}\) and \(\mathcal{V}\) are isomorphic, and we write \(\mathcal{U}\cong\mathcal{V},\) if there exists an invertible linear function \(\mathbf{L}:\mathcal{V}\to\mathcal{U}.\)

Theorem. Let \(\mathcal{U}\) and \(\mathcal{V}\) be finite-dimensional vector spaces over a common field \(\mathbb{F}\). The vector spaces \(\mathcal{U}\) and \(\mathcal{V}\) are isomorphic if and only if they have the same dimension.

  In particular, if \(\mathcal{V}\) is a finite-dimensional space over a field \(\mathbb{F}\), then \(\mathcal{V}\cong\mathbb{F}^{d}\) where \(d=\operatorname{dim}\mathcal{V}\).

Proof. (See section 5 of the notes)

Part 7

Rank-Nullity

(See Section 6 in the notes)

Definition 6.1. Let \(\mathbf{L}: \mathcal{V}\to\mathcal{U}\) be linear and \(\mathcal{V}\) finite dimensional. The rank of \(\mathbf{L}\) is the dimension of the image, that is,

\[\operatorname{rank}(\mathbf{L}):=\operatorname{dim}(\mathbf{L}(\mathcal{V})).\]

Definition. Let \(\mathbf{L}: \mathcal{V}\to\mathcal{U}\) be linear and \(\mathcal{V}\) finite dimensional. The nullity of \(\mathbf{L}\) is the dimension of the kernel of \(\mathbf{L}\), that is, \[\operatorname{nullity}(\mathbf{L}):=\operatorname{dim}(\operatorname{ker}(\mathbf{L})).\]

Example. Consider the linear map \(\mathbf{A}:\mathbb{R}^{4}\to\mathbb{R}^{3}\) with matrix representation \[\mathbf{A} = \begin{bmatrix} 1 & 0 & -1 & 0\\ 0 & 1 & 2 & 2\\ 0 & 0 & 0 & 0\end{bmatrix}\]

\(\left\{\begin{bmatrix} 1\\0\\0\end{bmatrix},\begin{bmatrix}0\\1\\0\end{bmatrix}\right\}\) is a basis for the image \(\mathbf{A}(\mathbb{R}^{4})\), and therefore \(\operatorname{rank}(\mathbf{A})=2\)

\(\left\{\begin{bmatrix} 1\\-2\\1\\0\end{bmatrix},\begin{bmatrix}0\\-2\\0\\1\end{bmatrix}\right\}\) is a basis for the kernel \(\operatorname{ker}(\mathbf{A})\),

 and therefore \(\operatorname{nullity}(\mathbf{A})=2\)

Theorem (The Rank-Nullity Theorem). If \(\mathbf{L}:\mathcal{V}\to\mathcal{U}\) is a linear operator, then \[\operatorname{dim}(\mathcal{V}) = \operatorname{rank}(\mathbf{L}) + \operatorname{nullity}(\mathbf{L}).\]

Proof. (See the proof in Section 6 of the notes).

Set \(Z = \operatorname{nullity}(\mathbf{L})\) and let \(\{\mathbf{w}_{n}\}_{n=1}^{Z}\) be a basis for \(\operatorname{ker}(\mathbf{L})\). Set \(N = \operatorname{dim}(\mathcal{V}),\) and extend \(\{\mathbf{w}_{n}\}_{n=1}^{Z}\) to a basis \(\{\mathbf{w}_{n}\}_{n=1}^{N}\) for \(\mathcal{V}\).

 

For each \(n\in\{Z+1,Z+2,\ldots,N\}\) set \[\mathbf{u}_{n}:=\mathbf{L}(\mathbf{w}_{n}).\]

We will now show that \(\{\mathbf{u}_{n}\}_{n=Z+1}^{N}\) is a basis for \(\mathbf{L}(\mathcal{V})\).

Let \(\{c_{n}\}_{n=Z+1}^{N}\) be a sequence of scalars such that \[\sum_{n=Z+1}^{N}c_{n}\mathbf{u}_{n} = \mathbf{0}.\]

It follows that \[\mathbf{0} = \sum_{n=Z+1}^{N}c_{n}\mathbf{L}(\mathbf{w}_{n})=\mathbf{L}\left(\sum_{n=Z+1}^{N}c_{n}\mathbf{w}_{n}\right).\] We deduce from this that \[\sum_{n=Z+1}^{N}c_{n}\mathbf{w}_{n}\in\operatorname{ker}(\mathbf{L}).\] Since \(\{\mathbf{w}_{n}\}_{n=1}^{Z}\) is a basis for \(\operatorname{ker}(\mathbf{L})\), there exist coefficients \(\{b_{n}\}_{n=1}^{Z}\) such that \[\sum_{n=Z+1}^{N}c_{n}\mathbf{w}_{n} = \sum_{n=1}^{Z}b_{n}\mathbf{w}_{n}.\]

This implies \[\sum_{n=Z+1}^{N}c_{n}\mathbf{w}_{n} - \sum_{n=1}^{Z}b_{n}\mathbf{w}_{n} = \mathbf{0}.\]

Since \(\{\mathbf{w}_{n}\}_{n=1}^{N}\) is linearly independent, all of the coefficients in the above linear combination are zero. In particular, \(c_{Z+1}=c_{Z+2}=\cdots=c_{N}=0\). This shows that \(\{\mathbf{u}_{n}\}_{n=Z+1}^{N}\) is linearly independent.

Let \(\mathbf{u}\in\mathbf{L}(\mathcal{V})\). There exists \(\mathbf{v}\in\mathcal{V}\) such that \(\mathbf{u} = \mathbf{L}(\mathbf{v})\). Since \(\{\mathbf{w}_{n}\}_{n=1}^{N}\) is a basis for \(\mathcal{V}\), there exist scalars \(\{a_{n}\}_{n=1}^{N}\) such that

\[\mathbf{v} = \sum_{n=1}^{N}a_{n}\mathbf{w}_{n}.\]

Thus,

\[\mathbf{u} = \mathbf{L}\left(\sum_{n=1}^{N}a_{n}\mathbf{w}_{n}\right) = \sum_{n=1}^{N}a_{n}\mathbf{L}(\mathbf{w}_{n})\]

\[=\sum_{n=1}^{Z}a_{n}\mathbf{L}(\mathbf{w}_{n}) + \sum_{n=Z+1}^{N}a_{n}\mathbf{L}(\mathbf{w}_{n})\]

\[=\sum_{n=1}^{Z}a_{n}\mathbf{0} + \sum_{n=Z+1}^{N}a_{n}\mathbf{u}_{n} = \sum_{n=Z+1}^{N}a_{n}\mathbf{u}_{n}.\]

This shows that \(\mathbf{u}\in\operatorname{span}\{\mathbf{u}_{n}\}_{n=Z+1}^{N}\). Since \(\mathbf{u}\in\mathbf{L}(\mathcal{V})\) was arbitrary, and \(\mathbf{u}_{n}=\mathbf{L}(\mathbf{w}_{n})\) is in the subspace \(\mathbf{L}(\mathcal{V})\) for each \(n\in\{Z+1,Z+2,\ldots,N\}\), we conclude that \(\{\mathbf{u}_{n}\}_{n=Z+1}^{N}\) spans \(\mathbf{L}(\mathcal{V})\).

Since \(\{\mathbf{u}_{n}\}_{n=Z+1}^{N}\) is a basis for \(\mathbf{L}(\mathcal{V})\), we have

\[\operatorname{rank}(\mathbf{L}) = \operatorname{dim}(\mathbf{L}(\mathcal{V})) = N-Z = \operatorname{dim}(\mathcal{V}) - \operatorname{dim}(\operatorname{ker}(\mathbf{L}))\]

\[ =  \operatorname{dim}(\mathcal{V}) - \operatorname{nullity}(\mathbf{L}) .\ \Box\]

Lemma. Assume \(\mathcal{U}\) is a finite dimensional vector space. If \(\mathcal{W}\) is a subspace of \(\mathcal{U}\), and \(\operatorname{dim}(\mathcal{U}) = \operatorname{dim}(\mathcal{W})\), then \(\mathcal{W}=\mathcal{U}.\)

Prove it!

Theorem 6.2. Let \(\mathbf{L}:\mathcal{V}\rightarrow\mathcal{U}\) be linear where \(\mathcal{U}\) and \(\mathcal{V}\) are finite-dimensional.

(a) \(\operatorname{rank}(\mathbf{L})\leq\operatorname{dim}(\mathcal{U})\) and \(\operatorname{rank}(\mathbf{L})\leq\operatorname{dim}(\mathcal{V})\).

(b) If \(\operatorname{dim}(\mathcal{U})<\operatorname{dim}(\mathcal{V})\) then \(\operatorname{ker}(\mathbf{L})\neq\{\mathbf{0}\}\).

(c) If \(\operatorname{dim}(\mathcal{U})>\operatorname{dim}(\mathcal{V})\) then \(\mathbf{L}(\mathcal{V})\neq\mathcal{U}\).

(d) If \(\mathbf{L}\) is invertible then \(\operatorname{dim}(\mathcal{V})=\operatorname{dim}(\mathcal{U})\).

(e) If \(\operatorname{dim}(\mathcal{U})=\operatorname{dim}(\mathcal{V})\) then the following are equivalent:

   (i) \(\mathbf{L}\) is invertible.

   (ii) \(\operatorname{rank}(\mathbf{L})=\operatorname{dim}(\mathcal{U})\).

   (iii) \(\mathbf{L}(\mathcal{V})=\mathcal{U}\).

   (iv) \(\operatorname{ker}(\mathbf{L})=\{\mathbf{0}\}\).

(f) If \(\mathbf{K}:\mathcal{W}\rightarrow\mathcal{V}\) is linear where \(\mathcal{W}\) is finite-dimensional then
\(\operatorname{rank}(\mathbf{L}\mathbf{K})\leq\operatorname{rank}(\mathbf{L})\) and \(\operatorname{rank}(\mathbf{L}\mathbf{K})\leq\operatorname{rank}(\mathbf{K})\).

(g) If \(\operatorname{dim}(\mathcal{U})=\operatorname{dim}(\mathcal{V})\) and \(\mathbf{K}:\mathcal{U}\rightarrow\mathcal{V}\) is linear and satisfies \(\mathbf{L}\mathbf{K}=\mathbf{I}\) then \(\mathbf{L}\) is invertible with \(\mathbf{L}^{-1}=\mathbf{K}\).

EigenQuiz questions:

  1. True or False: There exists a linear map \(\mathbf{L}:\mathbb{R}^{3}\to\mathbb{R}^{2}\) of nullity \(0\).
  2. For linear maps \(\mathbf{L}:\mathbb{R}\to\mathbb{R}\), it _________ holds that \(\operatorname{ker}(\mathbf{L}) = \operatorname{im}(\mathbf{L})\).
  3. For \(\mathbf{A}\in\mathbb{R}^{3\times 2}\) such that \(\operatorname{ker}(\mathbf{A}) = \{\mathbf{0}\}\), it _________ holds that \(\operatorname{im}(\mathbf{A}) = \mathbb{R}^{3}\).
  4. Given linear \(\mathbf{L}:\mathbb{R}^{n}\to\mathbb{R}^{m}\) such that \(\operatorname{nullity}(\mathbf{L}) = 1\) and \(n>m\), it _________ holds that \(\operatorname{rank}(\mathbf{L}) = m\).

Part 8

Eigenthings

(See Section 7 in the notes)

Definition 7.1. A nonzero vector \(\mathbf{v}\in\mathcal{V}\) is an eigenvector of a linear operator \(\mathbf{L}:\mathcal{V}\to\mathcal{V}\) if there is some \(\lambda\in\mathbb{F}\) such that \(\mathbf{Lv} = \lambda\mathbf{v}\). In this case the scalar \(\lambda\) is called an eigenvalue of \(\mathbf{L}\).

Examples. (a) Let \(\mathbf{A}:\mathbb{R}^{3}\to\mathbb{R}^{3}\) be the linear map given my multiplication by \[\mathbf{A} = \begin{bmatrix} 1 & 1 & 0\\ 0 & 2 & 1\\ 0 & 0 & 3\end{bmatrix}.\]

  • \(\begin{bmatrix} 1 & 1 & 0\\ 0 & 2 & 1\\ 0 & 0 & 3\end{bmatrix}\begin{bmatrix}1\\ 0\\ 0\end{bmatrix} = \begin{bmatrix}1\\ 0\\ 0\end{bmatrix}\)
  • \(\begin{bmatrix} 1 & 1 & 0\\ 0 & 2 & 1\\ 0 & 0 & 3\end{bmatrix}\begin{bmatrix}1\\ 1\\ 0\end{bmatrix} = \begin{bmatrix}2\\ 2\\ 0\end{bmatrix} =  2\begin{bmatrix}1\\ 1\\ 0\end{bmatrix}\)
  • \(\begin{bmatrix} 1 & 1 & 0\\ 0 & 2 & 1\\ 0 & 0 & 3\end{bmatrix}\begin{bmatrix}1/2\\ 1\\ 1\end{bmatrix} = \begin{bmatrix}3/2\\ 3\\ 3\end{bmatrix} =  3\begin{bmatrix}1/2\\ 1\\ 1\end{bmatrix}\)

From these we can see that \(1,2,\) and \(3\) are eigenvalues of \(\mathbf{A}\).

Examples. (b) Let \(C^{\infty}(\mathbb{R})\) be the space of real, infinitely differentiable functions on \(\mathbb{R}\), and let \(\mathbf{D}:C^{\infty}(\mathbb{R})\to C^{\infty}(\mathbb{R})\) be the differential operator, that is, 

\[(\mathbf{D}f)(x) = f'(x)\quad\text{for }f\in C^{\infty}(\mathbb{R}),\ x\in\mathbb{R}.\]

For each \(a\in\mathbb{R}\) let \(E_{a}:\mathbb{R}\to\mathbb{R}\) be given by \(E_{a}(x) = e^{ax}\) for each \(x\in\mathbb{R}\). Since \(E_{a}'(x) = a e^{ax} = aE_{a}(x),\) we see that \(E_{a}\) is an eigenvector of \(\mathbf{D}\) and \(a\) is an eigenvalue of \(\mathbf{D}\) for each \(a\in\mathbb{R}\).

 

(c) Let \(\mathbb{Z}_{N}\) be the set \(\{0,1,2,\ldots,N-1\}\) with addition modulo \(N\). Let \(\mathbf{T}:\mathbb{C}^{\mathbb{Z}_{N}}\to\mathbb{C}^{\mathbb{Z}_{N}}\) be given by \[(\mathbf{Tx})(n) = \mathbf{x}(n-1).\]

If \(\mathbf{1}\in\mathbb{C}^{\mathbb{Z}_{N}}\) is the all-ones vector, that is, \(\mathbf{1}(n)=1\) for all \(n\in\mathbb{Z}_{N}\), then \(\mathbf{T}\mathbf{1} = \mathbf{1}\). Thus \(1\) and \(\mathbf{1}\) are an eigenvalue and eigenvector of \(\mathbf{T}\), respectively.

 

Examples. (c) continued

 

More generally, for \(k\in\{0,1,2,\ldots,N-1\}\), let \(\mathbf{x}_{k}\in\mathbb{C}^{\mathbb{Z}_{N}}\) be given by

\[\mathbf{x}_{k}(n) = e^{-2\pi i nk/N},\quad\text{for each } n\in\mathbb{Z}_{N}.\]

Then, for each \(n\in\mathbb{Z}_{N}\) we have

\[(\mathbf{T}\mathbf{x}_{k})(n) = \mathbf{x}_{k}(n-1) = e^{-2\pi i (n-1)k/N}\]

 

This shows that \( e^{2\pi ik/N}\) and \(\mathbf{x}_{k}\) an eigenvalue and eigenvector of \(\mathbf{L}\), respectively, for each \(k\in\{0,1,2,\ldots,N-1\}\).

 

For \(N=4\) the matrix representation of \(\mathbf{T}\) is

\[\begin{bmatrix}0&0&0&1\\1&0&0&0\\0&1&0&0\\0&0&1&0\end{bmatrix}\]

\[ = e^{2\pi ik/N}e^{-2\pi i nk/N} =  e^{2\pi ik/N}\mathbf{x}_{k}(n).\]

Examples. (c) continued

 

\[\mathbf{x}_{0} =\begin{bmatrix} 1\\1\\1\\1\end{bmatrix},\quad \mathbf{x}_{1}:=\begin{bmatrix} e^{-2\pi i\cdot 0\cdot 1/4}\\e^{-2\pi i\cdot 1\cdot 1/4}\\e^{-2\pi i\cdot 2\cdot 1/4}\\e^{-2\pi i\cdot 3\cdot 1/4}\end{bmatrix}=\begin{bmatrix} 1\\-i\\-1\\i\end{bmatrix},\quad \mathbf{x}_{2}:=\begin{bmatrix} 1\\-1\\1\\-1\end{bmatrix},\quad \mathbf{x}_{3}:=\begin{bmatrix} 1\\i\\-1\\-i\end{bmatrix}\]

 

And, for example

\[\textbf{Tx}_{1} = \begin{bmatrix}0&0&0&1\\1&0&0&0\\0&1&0&0\\0&0&1&0\end{bmatrix}\begin{bmatrix} 1\\-i\\-1\\i\end{bmatrix} = \begin{bmatrix} i\\1\\-i\\-1\end{bmatrix} = i\begin{bmatrix} 1\\-i\\-1\\i\end{bmatrix} = i\mathbf{x}_{1} = e^{2\pi i/4}\mathbf{x}_{1}.\]

Examples. (d) Let \(\mathbf{B}:\mathbb{R}^{2}\to\mathbb{R}^{2}\) be the linear map whose matrix representation is

\[\mathbf{B} = \begin{bmatrix} 1&-2\\1&-1\end{bmatrix}.\]

If \(\left[\begin{smallmatrix}a\\b\end{smallmatrix}\right]\in\mathbb{R}^{2}\) is an eigenvector of \(\mathbf{B}\) with eigenvalue \(\lambda\), then

\[\begin{bmatrix} a-2b\\a-b\end{bmatrix} = \begin{bmatrix} 1&-2\\1&-1\end{bmatrix}\begin{bmatrix} a\\b\end{bmatrix} = \lambda\begin{bmatrix} a\\b\end{bmatrix}\]

\[\Rightarrow \left\{\begin{array}{rrl}a & -2b & = \lambda a\\ a & -b & =\lambda b\end{array}\right.\quad \Rightarrow \left\{\begin{array}{rrl}(1-\lambda)a & -2b & = 0\\ a & -(1+\lambda)b & = 0 \end{array}\right.\]

Solving for \(a\) in the second equation, and plugging it into the first we obtain

\[ \left\{\begin{array}{rl} a & =(1+\lambda)b\\ (1-\lambda)(1+\lambda)b -2b & = 0\end{array}\right.\quad \Rightarrow \left\{\begin{array}{rl} a & =(1+\lambda)b\\ -(\lambda^2+1)b & = 0\end{array}\right.\]

Since \(\lambda^{2}+1\neq 0\) for any \(\lambda\in\mathbb{R}\), the second equation implies \(b=0\), then the first equation implies \(a=0\). Since the zero vector is not an eigenvector, this shows that \(\mathbf{B}\) has no eigenvalues or eigenvectors. 

Theorem 7.2. Let \(\mathcal{V}\) be a nontrivial finite-dimensional vector space over a field \(\mathbb{F}\) and let \(\mathbf{L}:\mathcal{V}\rightarrow\mathcal{V}\) be linear.

(a) A scalar \(\lambda\in\mathbb{F}\) is an eigenvalue for \(\mathbf{L}\) if and only if \(\lambda\mathbf{I}-\mathbf{L}\) is not invertible.

When this occurs, the corresponding eigenvectors are the nonzero members of the eigenspace \(\ker(\lambda\mathbf{I}-\mathbf{L})\).

(b) \(\mathbf{L}\) is invertible if and only if \(0\) is not an eigenvalue of \(\mathbf{L}\).

(c) If \(\mathbf{L}\mathbf{v}=\lambda\mathbf{v}\) then \(\displaystyle{\left(\sum_{k=0}^K c_k\mathbf{L}^k\right)\mathbf{v}=\left(\sum_{k=0}^K c_k\lambda^k\right)\mathbf{v}}\) for any scalars \(\{c_k\}_{k=0}^K\) in \(\mathbb{F}\),

(d) If \(\{\mathbf{v}_n\}_{n\in\mathcal{N}}\) is a finite sequence of eigenvectors of \(\mathbf{L}\), and the corresponding eigenvalues \(\{\lambda_n\}_{n\in\mathcal{N}}\) are distinct (i.e., \(\lambda_{n_1}\neq\lambda_{n_2}\) whenever \(n_1\neq n_2\)) then \(\{\mathbf{v}_n\}_{n\in\mathcal{N}}\) is linearly independent. In particular, \(\mathbf{L}\) has at most \(\dim(\mathcal{V})\) distinct eigenvalues.

Examples. (e) Let \(\mathbf{B}:\mathbb{C}^{2}\to\mathbb{C}^{2}\) be the linear map whose matrix representation is

\[\mathbf{B} = \begin{bmatrix} 1&-2\\1&-1\end{bmatrix}.\]

 

\[\begin{bmatrix} 1&-2\\1&-1\end{bmatrix}\begin{bmatrix} 1+i\\1\end{bmatrix} = \begin{bmatrix} -1+i\\i\end{bmatrix} = i\begin{bmatrix} 1+i\\1\end{bmatrix}\] and \[\begin{bmatrix} 1&-2\\1&-1\end{bmatrix}\begin{bmatrix} 1-i\\1\end{bmatrix} = \begin{bmatrix} -1-i\\-i\end{bmatrix} = -i\begin{bmatrix} 1-i\\1\end{bmatrix}\]

 

So, \(\mathbf{B}\) has eigenvalues \(i\) and \(-i\).

 

(Compare this to the last example!)

Theorem. (The Fundamental Theorem of Algebra). For any scalars \(\{c_{k}\}_{k=0}^{K-1}\) in \(\mathbb{C}\), there exist scalars \(\{a_{k}\}_{k=1}^{K}\) such that \[\lambda^{K} + \sum_{k=0}^{K-1}c_{k}\lambda^{k} = \prod_{k=1}^{K}(\lambda-a_{k}),\quad\text{for all }\lambda\in\mathbb{C}.\]

Examples. 

  • \(\lambda^{2}-3\lambda+2 = (\lambda-2)(\lambda-1)\)
  • \(\lambda^{2}-\lambda+1 = \left(\lambda-\frac{1+\sqrt{5}}{2}\right)\left(\lambda-\frac{1-\sqrt{5}}{2}\right)\)
  • \(\lambda^{2}+1 = (\lambda+i)(\lambda-i)\) (Can't be factored over \(\mathbb{R}\))
  • \(\lambda^{3}-\lambda^{2}-4\lambda-6 = (\lambda-3)\big(\lambda-(-1+i)\big)\big(\lambda-(-1-i)\big)\)

Theorem 7.5. If \(\mathcal{V}\) is a nontrivial finite-dimensional vector space over \(\mathbb{C}\), then any linear operator \(\mathbf{L}:\mathcal{V}\to\mathcal{V}\) has an eigenvalue.

Proof idea: If \(N=\operatorname{dim}\mathcal{V}\) then for any \(\mathbf{v}\in\mathcal{V}\) the sequence \(\{\mathbf{L}^{k}\mathbf{v}\}_{k=0}^{N}\) is dependent. Hence \[\sum_{k=0}^{N}c_{k}\mathbf{L}^{k}\mathbf{v}=\mathbf{0},\] for some scalars \(c_{0},c_{1},\ldots,c_{N}\). Without loss of generality we may assume \(c_{K}=1\) and \(c_{K+1}=c_{K+2}=\cdots=c_{N}=0\), then we have \[\left(\mathbf{L}^{K}+\sum_{k=0}^{K-1}c_{k}\mathbf{L}^{k}\right)\mathbf{v}=\mathbf{0}.\] Set \(f(\lambda) = \lambda^{K}+c_{K-1}\lambda^{K-1}+\cdots+c_{1}\lambda+c_{0}\). By the fundamental theorem of algebra there are complex numbers \(a_{1},\ldots,a_{K}\) such that

\[f(\lambda) = (\lambda-a_{1})(\lambda-a_{2})\cdots(\lambda-a_{K}).\] Since \(\mathbf{L}\) and \(\mathbf{I}\) commute, we have \[\left(\mathbf{L}^{K}+\sum_{k=0}^{K-1}c_{k}\mathbf{L}^{k}\right) = (\mathbf{L}-a_{1}\mathbf{I})(\mathbf{L}-a_{2}\mathbf{I})\cdots(\mathbf{L}-a_{K}\mathbf{I})\]

Thus, for \(\mathbf{v}\neq 0\) we have

\[(\mathbf{L}-a_{1}\mathbf{I})(\mathbf{L}-a_{2}\mathbf{I})\cdots(\mathbf{L}-a_{K}\mathbf{I})\mathbf{v} = \mathbf{0}.\]

One of the operators \(\mathbf{L}-a_{j}\mathbf{I}\) must not be invertible, and this implies \(a_{j}\) is an eigenvalue of \(\mathbf{L}.\) \(\Box\)

Lemma. If \(\mathcal{V}\) is a finite dimensional vector space over \(\mathbb{F}\), then \(\operatorname{Hom}(\mathcal{V})\) is a finite dimensional vector space. More specifically, if \(\operatorname{dim}(\mathcal{V})=N\), then \(\operatorname{dim}(\operatorname{Hom}(\mathcal{V}))=N^{2}\).

Proof sketch. Let \(\{\mathbf{v}_{n}\}_{n=1}^{N}\) be a basis for \(\mathcal{V}\). For each pair \((m,n)\in[N]\times[N]\), let \(\mathbf{E}_{(m,n)}:\mathcal{V}\to\mathcal{V}\) be the unique linear operator such that

\[\mathbf{E}_{(m,n)}(\mathbf{v}_{k}) = \begin{cases} \mathbf{v}_{n} & k=m\\ \mathbf{0} & k\neq m.\end{cases}\]

We claim that \(\{\mathbf{E}_{(m,n)}\}_{(m,n)\in[N]\times[N]}\) is a basis for \(\operatorname{Hom}(\mathcal{V})\).

 

Proof continued. Let \(\mathbf{L}:\mathcal{V}\to\mathcal{V}\) be an arbitrary linear operator. For each \(m\in[N]\), let \(\{c_{(m,n)}\}_{n=1}^{N}\) be the sequence of scalars such that

\[\mathbf{L}\mathbf{v}_{m} = \sum_{n=1}^{N}c_{(m,n)}\mathbf{v}_{n}\]

Define the linear operator

\[\mathbf{K}:=\sum_{n=1}^{N}\sum_{m=1}^{N}c_{(m,n)}\mathbf{E}_{(m,n)}.\]

For each \(k\in[N]\) we can compute

\[\mathbf{K}(\mathbf{v}_{k}) = \mathbf{L}(\mathbf{v}_{k}).\]

Since \(\mathbf{L}\) and \(\mathbf{K}\) agree on a basis shows that they are equal.

Next, we must show that \(\{\mathbf{E}_{(m,n)}\}_{(m,n)\in[N]\times[N]}\) is independent. This was shown in class. \(\Box\)

Corollary. If \(\mathbf{A}:\mathcal{V}\to\mathcal{V}\) is linear and \(\mathcal{V}\) is finite dimensional, then for some \(M \leq (\operatorname{dim}(\mathcal{V}))^{2}\), there exist scalars \(c_{0},c_{1},\ldots,c_{M}\) such that

\[c_{M}\mathbf{A}^{M}+c_{M-1}\mathbf{A}^{M-1}+\cdots+c_{2}\mathbf{A}^{2}+c_{1}\mathbf{A}+c_{0}\mathbf{I} = \mathbf{0}.\]

Part 9

Diagonalization

(See Section 8 in the notes)

Theorem 8.1. Let \(\mathbf{L}:\mathcal{V}\to\mathcal{U}\) be linear where \(\mathcal{V}\) and \(\mathcal{U}\) are nontrivial finite-dimensional, and let \(\{\mathbf{u}_{m}\}_{m\in\mathcal{M}}\) and \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) be any bases for \(\mathcal{U}\) and \(\mathcal{V}\), respectively. Then, there exists a unique matrix \(\mathbf{A}\in\mathbb{F}^{\mathcal{M}\times\mathcal{N}}\) such that

\[\mathbf{L}\mathbf{v}_{n} = \sum_{m\in\mathcal{M}}\mathbf{A}(m,n)\mathbf{u}_{m}\]

for all \(n\in\mathcal{N}\); it is called the matrix representation of \(\mathbf{L}\) with respect to \(\{\mathbf{u}_{m}\}_{m\in\mathcal{M}}\) and \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\).

Moreover,

\[\mathbf{L} = \mathbf{UAV}^{-1}\]

where \(\mathbf{U}:\mathbb{F}^{\mathcal{M}}\to\mathcal{U}\) and \(\mathbf{V}:\mathbb{F}^{\mathcal{N}}\to\mathcal{V}\) are the synthesis operators of \(\{\mathbf{u}_{m}\}_{m\in\mathcal{M}}\) and \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\), respectively.

Definition 8.3.

(a) 

A linear operator \(\mathbf{L}:\mathcal{V}\to\mathcal{V}\) is diagonalizable if there exists an eigenbasis for \(\mathbf{L}\), that is, a basis \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) for \(\mathcal{V}\) with the property that each \(\mathbf{v}_{n}\) is an eigenvector of \(\mathbf{L}.\)

(b) 

A matrix \(\mathbf{A}\in\mathbb{F}^{\mathcal{N}\times\mathcal{N}}\) is diagonal if \(\mathbf{A}(n_{1},n_{2})=0\) whenever \(n_{1}\neq n_{2}\).

Example. Let \(\mathbf{L}:\mathbb{R}^{2}\to\mathbb{R}^{2}\) be given by

\[\mathbf{L}\left(\begin{bmatrix}a\\b\end{bmatrix}\right) = \begin{bmatrix}1&1\\0&-1\end{bmatrix}\begin{bmatrix}a\\b\end{bmatrix}\]

Note that \(\left\{\begin{bmatrix}1\\0\end{bmatrix},\begin{bmatrix}-1\\2\end{bmatrix}\right\}\) is an eigenbasis for \(\mathbf{L}\). The matrix representation for \(\mathbf{L}\) with respect to this basis is the diagonal matrix

\[\begin{bmatrix}1&0\\0&-1\end{bmatrix}.\]

Example. Let \(\mathbf{S}:\mathbb{C}^{2}\to\mathbb{C}^{2}\) be given by

\[\mathbf{S}\left(\begin{bmatrix}a\\b\end{bmatrix}\right) = \begin{bmatrix}1&1\\0&1\end{bmatrix}\begin{bmatrix}a\\b\end{bmatrix}\]

Observe that \(1\) is an eigenvalue of \(\mathbf{S}\). In fact, the associated eigenspace is one-dimensional: \(\operatorname{ker}(\mathbf{S}-\mathbf{I})=\operatorname{span}\left\{\begin{bmatrix}1\\0\end{bmatrix}\right\}.\)

Suppose \(\lambda\in\mathbb{C}\) is another eigenvalue of \(\mathbf{S}\), then the operator \(\mathbf{S}-\lambda\mathbf{I}\) is not invertible. However, we can see that for any \(\lambda\neq 1\), the set

\[\{(\mathbf{S}-\lambda\mathbf{I})(\boldsymbol{\delta}_{1}),(\mathbf{S}-\lambda\mathbf{I})(\boldsymbol{\delta}_{2})\} = \left\{\begin{bmatrix}1-\lambda\\0\end{bmatrix},\begin{bmatrix}1\\1-\lambda\end{bmatrix}\right\}\]

is a basis for \(\mathbb{C}^{2}\). It follows that \(\mathbf{S}-\lambda\mathbf{I}\) is a bijection for all \(\lambda\neq 1\), that is, the only eigenvalue of \(\mathbf{S}\) is \(1\), and thus \(\mathbf{S}\) is not diagonalizable.

Theorem 8.4. Let \(\mathbf{L}:\mathcal{V}\to\mathcal{V}\) be linear.

The following are equivalent:

  1. \(\mathbf{L}\) is diagonalizable.
  2. \(\mathbf{L}=\mathbf{V\Lambda V}^{-1}\) for some invertible linear function \(\mathbf{V}:\mathbb{F}^{\mathcal{N}}\to\mathcal{V}\) and a diagonal matrix \(\mathbf{\Lambda}\in\mathbb{F}^{\mathcal{N}\times\mathcal{N}}\).
  3. \(\operatorname{dim}(\mathcal{V}) = \sum_{\lambda}\operatorname{dim}(\operatorname{ker}(\lambda\mathbf{I}-\mathbf{L})),\) where the sum is over the set of eigenvalues \(\lambda\) of \(\mathbf{L}\).

Moreover, if any of the equivalent conditions above hold, then letting \(\mathbf{V}:\mathbb{F}^{\mathcal{N}}\to\mathcal{V}\) be the synthesis operator of an eigenbasis \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) for \(\mathbf{L}\) with corresponding eigenvalues \(\{\lambda_{n}\}_{n\in\mathcal{N}}\), and letting \(\mathbf{\Lambda}\in\mathbb{F}^{\mathcal{N}\times\mathcal{N}}\) be the diagonal matrix with \(\mathbf{\Lambda}(n,n)=\lambda_{n}\) for all \(n\in\mathcal{N}\), we have \(\mathbf{L} = \mathbf{V\Lambda V}^{-1}\). 

Example. Let \(\mathcal{V}:=\operatorname{span}\{\cos(x),\sin(x)\}\subset \mathbb{C}^{\mathbb{R}}\). Let \(\mathbf{D}:\mathcal{V}\to\mathcal{V}\) denote the derivative operator.

 

Let \(\mathbf{u}_{1}(x) = \cos(x)\) and \(\mathbf{u}_{2}(x) = \sin(x)\). With respect to the basis \(\{\mathbf{u}_{1},\mathbf{u}_{2}\}\), the matrix representation of \(\mathbf{D}\) is \[\begin{bmatrix}0&1\\-1&0\end{bmatrix}.\]

On the other hand, if we let \(\mathbf{v}_{1}(x) = \cos(x)+i\sin(x)\), and \(\mathbf{v}_{2}(x) = \cos(x)-i\sin(x),\) then we have

\[(\mathbf{Dv}_{1})(x) = -\sin(x)+i\cos(x) = i(\cos(x)+i\sin(x)) = i\mathbf{v}_{1}(x)\]

and

\[(\mathbf{Dv}_{2})(x) = -\sin(x)-i\cos(x) = -i(\cos(x)-i\sin(x)) = -i\mathbf{v}_{2}(x).\]

From this we see that \(\{\mathbf{v}_{1},\mathbf{v}_{2}\}\) is an eigenbasis for \(\mathbf{D}\). The matrix representation of \(\mathbf{D}\) is \[\begin{bmatrix} i&0\\0&-i\end{bmatrix}.\]

Example. Let \(\mathbf{F}:\mathbb{R}^{2}\to\mathbb{R}^{2}\) be given by

\[\mathbf{F}\begin{bmatrix}a\\b\end{bmatrix}=\begin{bmatrix}1&1\\1&0\end{bmatrix}\begin{bmatrix}a\\b\end{bmatrix} = \begin{bmatrix}a+b\\a\end{bmatrix}\]

\[\mathbf{F}\begin{bmatrix}1\\0\end{bmatrix} = \begin{bmatrix}1\\1\end{bmatrix},\quad \mathbf{F}\begin{bmatrix}1\\1\end{bmatrix} = \begin{bmatrix}2\\1\end{bmatrix},\quad \mathbf{F}\begin{bmatrix}2\\1\end{bmatrix} = \begin{bmatrix}3\\2\end{bmatrix},\quad \mathbf{F}\begin{bmatrix}3\\2\end{bmatrix} = \begin{bmatrix}5\\3\end{bmatrix},\quad \mathbf{F}\begin{bmatrix}5\\3\end{bmatrix} = \begin{bmatrix}8\\5\end{bmatrix},\ldots\]

In general, if  \(\mathbf{x}_{n} = \mathbf{F}^{n}\boldsymbol{\delta}_{1}\), then \(\mathbf{x}_{n}(2)\) is the \(n\)th Fibonacci number: \[1,1,2,3,5,8,13,21,\ldots\]

Note that

To find the eigenvalues of \(\mathbf{F}\) we need to find \(\lambda\in\mathbb{R}\) such that \(\lambda\mathbf{I}-\mathbf{F}\) is not invertible. The matrix representation of \(\lambda\mathbf{I}-\mathbf{F}\) is

\[\begin{bmatrix}\lambda-1&-1\\-1&\lambda\end{bmatrix}.\]

Since both columns are nonzero, this matrix is invertible if and only if the second column is a scalar multiple of the first. This occurs if and only if \(\lambda=\frac{1\pm\sqrt{5}}{2}\).

Example continued. The number \(\phi:=\frac{1+\sqrt{5}}{2}\) is commonly known as the golden ratio. Observe that \(\phi^{2}=\phi+1\), and that \(\frac{1-\sqrt{5}}{2} = -\frac{1}{\phi}=1-\phi\). 

Note that \[\operatorname{ker}(\phi\mathbf{I}-\mathbf{F}) = \operatorname{span}\left\{\begin{bmatrix}\phi\\1\end{bmatrix}\right\}\text{     and     }\operatorname{ker}(-\tfrac{1}{\phi}\mathbf{I}-\mathbf{F}) = \operatorname{span}\left\{\begin{bmatrix}-\phi^{-1}\\1\end{bmatrix}\right\}\]

Thus, \(\left\{\begin{bmatrix}\phi\\1\end{bmatrix},\begin{bmatrix}-\phi^{-1}\\1\end{bmatrix}\right\}\) is an eigenbasis for \(\mathbf{F}\).

Let \(\mathbf{V}:\mathbb{R}^{2}\to\mathbb{R}^{2}\) be the synthesis operator of the above basis, then

\[\mathbf{V} = \begin{bmatrix}\phi & -\phi^{-1}\\ 1 & 1\end{bmatrix}\quad\text{and}\quad\mathbf{V}^{-1} =\frac{1}{\sqrt{5}} \begin{bmatrix}1&\phi^{-1}\\-1&\phi\end{bmatrix},\]

and by the previous theorem

\[\mathbf{F} = \mathbf{V}\begin{bmatrix}\phi&0\\0&-\phi^{-1}\end{bmatrix}\mathbf{V}^{-1} = \begin{bmatrix}\phi & -\phi^{-1}\\ 1 & 1\end{bmatrix}\begin{bmatrix}\phi&0\\0&-\phi^{-1}\end{bmatrix}\left(\frac{1}{\sqrt{5}} \begin{bmatrix}1&\phi^{-1}\\-1&\phi\end{bmatrix}\right)\]

Example continued.  Observe that

\[\boldsymbol{\delta}_{1}=\mathbf{V}(\mathbf{V}^{-1}\boldsymbol{\delta}_{1}) =\mathbf{V}\left(\frac{1}{\sqrt{5}}\begin{bmatrix}1\\-1\end{bmatrix}\right)= \frac{1}{\sqrt{5}}\begin{bmatrix}\phi\\1\end{bmatrix} -\frac{1}{\sqrt{5}}\begin{bmatrix}-\phi^{-1}\\1\end{bmatrix}\]

Thus, we have 

\[\mathbf{F}^{n}\boldsymbol{\delta}_{1} = \frac{1}{\sqrt{5}}\mathbf{F}^{n}\begin{bmatrix}\phi\\1\end{bmatrix} -\frac{1}{\sqrt{5}}\mathbf{F}^{n}\begin{bmatrix}-\phi^{-1}\\1\end{bmatrix}\]

\(n\)th Fibonnaci number: \(\displaystyle{\frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^{n} - \frac{1}{\sqrt{5}}\left(\frac{1-\sqrt{5}}{2}\right)^{n}}\)

\[ = \frac{1}{\sqrt{5}}\phi^{n}\begin{bmatrix}\phi\\1\end{bmatrix} -\frac{1}{\sqrt{5}}\left(-\frac{1}{\phi}\right)^{n}\begin{bmatrix}-\phi^{-1}\\1\end{bmatrix}\]

Part 10

Inner Product spaces

(See Section 9 in the notes)

What is the "distance" between two vectors?

Example 1. What should be the distance between two vectors:

\[\mathbf{x} = \begin{bmatrix} \mathbf{x}(1)\\ \mathbf{x}(2)\end{bmatrix}\quad\text{and}\quad \mathbf{y} = \begin{bmatrix} \mathbf{y}(1)\\ \mathbf{y}(2)\end{bmatrix}\]

\[\|\mathbf{x}-\mathbf{y}\| = \sqrt{|\mathbf{x}(1) - \mathbf{y}(1)|^{2} + |\mathbf{x}(2) - \mathbf{y}(2)|^{2}}\]

\[\sqrt{|\mathbf{x}(1) - \mathbf{y}(1)|^{2} + |\mathbf{x}(2) - \mathbf{y}(2)|^{2}}\]

What is the "distance" between two vectors?

Example 2. What about the distance between \(f,g\in \mathbb{R}^{[0,1]}\)

\[\|f-g\| = ???\]

\[\int_{0}^{1}f(x)-g(x)\,dx\]

\[\int_{0}^{1}|f(x)-g(x)|\,dx\]

\[\sqrt{\int_{0}^{1}|f(x)-g(x)|^{2}\,dx}\]

Better...

Best!

Until further notice, all vector spaces in these slides will be over \(\mathbb{R}\) or \(\mathbb{C}\). 

Definition 9.1. Let \(\mathcal{V}\) be a vector space. A norm on \(\mathcal{V}\) is a function at assigns every vector \(\mathbf{v}\in\mathcal{V}\) a nonnegative real number \(\|\mathbf{v}\|\) with the following properties:

(i) \(\|\mathbf{v}\|>0\) for \(\mathbf{v}\neq\mathbf{0}\). 

(ii) \(\|c\mathbf{v}\| = |c|\|\mathbf{v}\|\) for all \(c\in\mathbb{F}\) and \(\mathbf{v}\in\mathcal{V}\).

(iii) \(\|\mathbf{v}_{1}+\mathbf{v}_{2}\|\leq \|\mathbf{v}_{1}\|+\|\mathbf{v}_{2}\|\) for all \(\mathbf{v}_{1},\mathbf{v}_{2}\in\mathcal{V}\). 

 

The quantity \(\|\mathbf{v}\|\) is sometimes referred to as the length of \(\mathbf{v}\). Properties (i)-(iii) guarantee that it satisfies the usual conventions about length. In particular, property (iii) is called the triangle inequality.

Examples. Let \(\mathbb{F}=\mathbb{R}\) or \(\mathbb{C}\).

  • For \(\mathbf{x}\in\mathbb{F}^{N}\), set \[\|\mathbf{x}\| = \left(\sum_{n=1}^{N}|\mathbf{x}(n)|^{2}\right)^{1/2}.\] This defines a norm on \(\mathbb{F}^{N}\). (We will prove this later.)
  • More generally, for a finite nonempty set \(\mathcal{N}\) and \(\mathbf{x}\in\mathbb{F}^{\mathcal{N}}\), set \[\|\mathbf{x}\|=\left(\sum_{n\in\mathcal{N}}|\mathbf{x}(n)|^{2}\right)^{1/2}.\] This defines a norm on \(\mathbb{F}^{\mathcal{N}}\). (We will prove this later.)
  • Even more generally, let \(\mathcal{N}\) be a finite nonempty set and \(p\geq 1\), then for \(\mathbf{x}\in\mathbb{F}^{\mathcal{N}}\), set \[\|\mathbf{x}\|_{p}=\left(\sum_{n\in\mathcal{N}}|\mathbf{x}(n)|^{p}\right)^{1/p}.\] This defines a norm on \(\mathbb{F}^{\mathcal{N}}\). This is called the \(p\)-norm on \(\mathbb{F}^{\mathcal{N}}\). When \(p=2\) this is the norm in the previous example.

Examples continued. Let \(\mathbb{F}=\mathbb{R}\) or \(\mathbb{C}\).

  • Let \(\mathcal{V}\) be the real vector space of continuous real-valued functions on the closed interval \([0,1]\). For \(f\in\mathcal{V}\) and \(p\geq 1\) we set \[\|f\|_{p}=\left(\int_{0}^{1}|f(t)|^{p}\,dt\right)^{1/p}.\] This is a norm on \(\mathcal{V}\). In particular, when \(p=2\) we get the norm \[\|f\|_{2}=\left(\int_{0}^{1}|f(t)|^{2}\,dt\right)^{1/2},\] which we will later prove is a norm.

Find \(c\) such that \(\mathbf{cy}\) and \(\mathbf{x}-c\mathbf{y}\) are perpendicular:

What about angles?

\[\|c\mathbf{y}\|^{2} + \|\mathbf{x} - c\mathbf{y}\|^{2} = \|\mathbf{x}\|^{2}\]

\(\Downarrow\)

\((c\mathbf{y}(1))^{2} + (c\mathbf{y}(2))^{2}\)

\(\quad + (\mathbf{x}(1)-c\mathbf{y}(1))^{2} + (\mathbf{x}(2)-c\mathbf{y}(2))^{2}\)

\(\quad = \mathbf{x}(1)^{2} + \mathbf{x}(2)^{2}\)

\(c = \dfrac{\mathbf{x}(1)\mathbf{y}(1)+\mathbf{x}(2)\mathbf{y}(2)}{\|\mathbf{y}\|^{2}}\)

\(\|c\mathbf{y}\| = \dfrac{|\mathbf{x}(1)\mathbf{y}(1)+\mathbf{x}(2)\mathbf{y}(2)|}{\|\mathbf{y}\|}\)

Also: \(\cos(\theta) = \dfrac{\|c\mathbf{y}\|}{\|\mathbf{x}\|}\)

Definition 9.3. An inner product space is a real or complex vector space \(\mathcal{V}\) along with an inner product, which is a function that assigns a scalar denoted \(\langle \mathbf{v}_{1},\mathbf{v}_{2}\rangle\) to each pair of vectors \(\mathbf{v}_{1},\mathbf{v}_{2}\in\mathcal{V}\) with the following four properties:

(i) \(\langle\mathbf{v}_{1},\mathbf{v}_{2}+\mathbf{v}_{3}\rangle = \langle \mathbf{v}_{1},\mathbf{v}_{2}\rangle + \langle \mathbf{v}_{1},\mathbf{v}_{3}\rangle\) for all \(\mathbf{v}_{1},\mathbf{v}_{2},\mathbf{v}_{3}\in\mathcal{V}\).

(ii) \(\langle \mathbf{v}_{1},c\mathbf{v}_{2}\rangle = c\langle \mathbf{v}_{1},\mathbf{v}_{2}\rangle\) for all \(\mathbf{v}_{1},\mathbf{v}_{2}\in\mathcal{V}\) and all \(c\in\mathbb{F}\).

(iii) \(\langle\mathbf{v}_{1},\mathbf{v}_{2}\rangle = \overline{\langle \mathbf{v}_{2},\mathbf{v}_{1}\rangle}\) for all \(\mathbf{v}_{1},\mathbf{v}_{2}\in\mathcal{V}\).

(iv) \(\langle \mathbf{v},\mathbf{v}\rangle>0\) for all \(\mathbf{v}\neq 0\).

Given an inner product space \(\mathcal{V}\), the corresponding inner product-induced norm is \(\|\mathbf{v}\| = \sqrt{\langle \mathbf{v},\mathbf{v}\rangle}\). (We have not yet shown that this is a norm, but we will soon!)

Examples. Let \(\mathbb{F}=\mathbb{R}\) or \(\mathbb{C}\).

  • For \(\mathbf{x},\mathbf{y}\in\mathbb{F}^{N}\), set \[\langle\mathbf{x},\mathbf{y}\rangle = \sum_{n=1}^{N}\overline{\mathbf{x}(n)}\mathbf{y}(n).\] This defines an inner product on \(\mathbb{F}^{N}\). This is commonly called the dot product, and it is often denoted \(\mathbf{x}\cdot\mathbf{y}\).
  • More generally, for a finite nonempty set \(\mathcal{N}\) and \(\mathbf{x},\mathbf{y}\in\mathbb{F}^{\mathcal{N}}\), set \[\langle\mathbf{x},\mathbf{y}\rangle=\sum_{n\in\mathcal{N}}\overline{\mathbf{x}(n)}\mathbf{y}(n).\] This defines an inner product on \(\mathbb{F}^{\mathcal{N}}\).  This is also often called the dot product.
  • Let \(\mathcal{V}\) be the real vector space of continuous real-valued functions on the closed interval \([0,1]\). For \(f,g\in\mathcal{V}\) define  \[\langle f,g\rangle=\int_{0}^{1}f(t)g(t)\,dt.\] This is an inner product on \(\mathcal{V}\).

Theorem 9.5. If \(\mathcal{V}\) is an inner product space, then for all \(\mathbf{v},\mathbf{v}_{1},\mathbf{v}_{2},\mathbf{v}_{3}\in\mathcal{V}\) and all \(c\in\mathbb{F}\):

(i) \(\langle \mathbf{v},\mathbf{0}\rangle = 0\)

(ii) \(\langle \mathbf{v}_{1}+\mathbf{v}_{2},\mathbf{v}_{3}\rangle = \langle \mathbf{v}_{1},\mathbf{v}_{3}\rangle + \langle \mathbf{v}_{2},\mathbf{v}_{3}\rangle\)

(iii) \(\langle c\mathbf{v}_{1},\mathbf{v}_{2}\rangle = \overline{c}\langle \mathbf{v}_{1},\mathbf{v}_{2}\rangle\)

(iv) \(\|c\mathbf{v}\| = |c|\|\mathbf{v}\|\)

(v) \(\|\mathbf{v}\|>0\) for \(\mathbf{v}\neq \mathbf{0}\)

(vi) If \(\langle \mathbf{v}_{1},\mathbf{v}_{3}\rangle = \langle \mathbf{v}_{2},\mathbf{v}_{3}\rangle\) for all \(\mathbf{v}_{3}\in\mathcal{V}\), then \(\mathbf{v}_{1}=\mathbf{v}_{2}\),

(vii) \(\|\mathbf{v}_{1}+\mathbf{v}_{2}\|^{2} = \|\mathbf{v}_{1}\|^{2} + 2\operatorname{Re}\langle \mathbf{v}_{1},\mathbf{v}_{2}\rangle + \|\mathbf{v}_{2}\|^{2}\), (law of cosines)

(viii) If \(\langle \mathbf{v}_{1},\mathbf{v}_{2}\rangle=0\), then \(\|\mathbf{v}_{1}+\mathbf{v}_{2}\|^{2} = \|\mathbf{v}_{1}\|^{2}+\|\mathbf{v}_{2}\|^{2}\), (Pythagorean theorem)

(ix) \(\|\mathbf{v}_{1}+\mathbf{v}_{2}\|^{2} + \|\mathbf{v}_{1}-\mathbf{v}_{2}\|^{2} = 2\|\mathbf{v}_{1}\|^{2}+2\|\mathbf{v}_{2}\|^{2}\) (parallelogram law)

(x) If \(\mathbb{F}=\mathbb{R}\), then \(\langle \mathbf{v}_{1},\mathbf{v}_{2}\rangle =\frac{1}{4}\left(\|\mathbf{v}_{1}+\mathbf{v}_{2}\|^{2} - \|\mathbf{v}_{1}-\mathbf{v}_{2}\|^{2}\right)\)

      If \(\mathbb{F}=\mathbb{C}\), then \[\langle \mathbf{v}_{1},\mathbf{v}_{2}\rangle =\frac{1}{4}\left(\|\mathbf{v}_{1}+\mathbf{v}_{2}\|^{2} - \|\mathbf{v}_{1}-\mathbf{v}_{2}\|^{2}-i\|\mathbf{v}_{1}+i\mathbf{v}_{2}\|^{2}+i\|\mathbf{v}_{1}-i\mathbf{v}_{2}\|^{2} \right)\]

Theorem 9.5. If \(\mathcal{V}\) is an inner product space, then for all \(\mathbf{v},\mathbf{v}_{1},\mathbf{v}_{2},\mathbf{v}_{3}\in\mathcal{V}\) and all \(c\in\mathbb{F}\):

(xi) \(|\langle \mathbf{v}_{1},\mathbf{v}_{2}\rangle|\leq \|\mathbf{v}_{1}\|\|\mathbf{v}_{2}\|\), (Cauchy-Schwarz inequality)

(xii) \(\|\mathbf{v}_{1}+\mathbf{v}_{2}\|\leq \|\mathbf{v}_{1}\|+\|\mathbf{v}_{2}\|\) (triangle inequality) 

Theorem 9.5. If \(\mathcal{V}\) is an inner product space, then for all \(\mathbf{v},\mathbf{v}_{1},\mathbf{v}_{2},\mathbf{v}_{3}\in\mathcal{V}\) and all \(c\in\mathbb{F}\):

Example. Consider the function \(f:\mathbb{R}^{3}\to\mathbb{R}\) given by \[f(x,y,z)=2x+3y+5z.\] Find the maximum value of this function in the unit sphere, that is, \(\{(x,y,z) : \|(x,y,z)\|\leq 1\}\).

Equality holds in both these inequalities if and only if \(\{\mathbf{v}_{1},\mathbf{v}_{2}\}\) is dependent.

By Cauchy-Schwarz \[(2x+3y+5z)^{2}\leq (2^2+3^2+5^2)(x^2+y^2+z^2)\leq 38.\]

Thus, \(f(x,y,z)\leq \sqrt{38}\), and note that this bound is achieved for \((x,y,z)=(2/\sqrt{38},3/\sqrt{38},5/\sqrt{38})\).

Definition 9.6. Two vectors \(\mathbf{v}_{1}\) and \(\mathbf{v}_{2}\) are orthogonal if \(\langle \mathbf{v}_{1},\mathbf{v}_{2}\rangle = 0\).

Theorem 9.7. If \(\mathcal{V}\) is an inner product space, and \(\mathcal{U}\subset\mathcal{V}\) is a subspace, then the orthogonal complement of \(\mathcal{U}\), which is the set

\[\mathcal{U}^{\bot} = \{\mathbf{v}\in\mathcal{V} : \langle \mathbf{u},\mathbf{v}\rangle = 0\text{ for all }\mathbf{u}\in\mathcal{U}\},\]

is a subspace of \(\mathcal{V}\).

Least Squares 

Example 1.  Consider the following linear inverse problem: Solve

\[\begin{bmatrix} 1 & 2 & 1\\1 & 1 & 0\\0&0&0\end{bmatrix}\begin{bmatrix}x\\y\\z\end{bmatrix} = \begin{bmatrix}1\\2\\-1\end{bmatrix}.\]

It is easy to see that this equation has no solution. However, if we have an inner product (such as the dot product) then we have a notion of distance. Thus we might ask to find \(x,y,z\in\mathbb{R}\) such that

\[\left\|\begin{bmatrix} 1 & 2 & 1\\1 & 1 & 0\\0&0&0\end{bmatrix}\begin{bmatrix}x\\y\\z\end{bmatrix} - \begin{bmatrix}1\\2\\-1\end{bmatrix}\right\|\] is as small as possible.

Example 2.  Let \(\mathcal{V}\) be the real inner product space of all continuous functions \(f:[0,1]\to\mathbb{R}\) with inner product

\[\langle f,g\rangle = \int_{0}^{1}f(t)g(t)\,dt.\]

We wish to solve the following linear inverse problem: Find \(a,b,c\in\mathbb{R}\) such that

\[\cos(x) = a+bx+cx^{2}\quad\text{for all }x\in[0,1].\]

Unfortunately, this has no solution. Instead, we wish to find \(a,b,c\in\mathbb{R}\) such that

\[\int_{0}^{1}(\cos(t)-(a+bt+ct^2))^{2}\,dt\]

is as small as possible. If we define the functions \(p_{0}(x) = 1\), \(p_{1}(x) =x\), and \(p_{2}(x)=x^{2}\), then we are asking to find \(g\in\operatorname{span}\{p_{0},p_{1},p_{2}\}\subset\mathcal{V}\) such that \(\|\cos-g\|\) is as small as possible.

Least Squares 

Part 11

Adjoints

(See Section 10 in the notes)

Definition 10.1. Let \(\mathbf{L}:\mathcal{V}\to\mathcal{U}\) be a linear map between inner product spaces \(\mathcal{V}\) and \(\mathcal{U}\). A function \(\mathbf{L}^{\ast}:\mathcal{U}\to\mathcal{V}\) is an adjoint of \(\mathbf{L}\) if

\[\langle \mathbf{L}^{\ast}(\mathbf{u}),\mathbf{v}\rangle = \langle \mathbf{u},\mathbf{Lv}\rangle\quad\text{for all }\mathbf{u}\in\mathcal{U}\text{ and }\mathbf{v}\in\mathcal{V}.\]

Example. Let

\[\mathbf{L} = \begin{bmatrix} 1&2&0\\0&1&-1\end{bmatrix}\quad\text{and}\quad \mathbf{S} = \begin{bmatrix}1&0\\2&1\\0&-1\end{bmatrix}\]

be the matrices representing the linear operators between \(\mathbf{L}:\mathbb{R}^{3}\to\mathbb{R}^{2}\) and \(\mathbf{S}:\mathbb{R}^{2}\to\mathbb{R}^{3}\) both inner product spaces with the dot product. Given \(x,y,z,a,b\in\mathbb{R}\) we compute

\[\begin{bmatrix}a\\b\end{bmatrix}\cdot \left(\mathbf{L}\begin{bmatrix}x\\y\\z\end{bmatrix}\right) = \begin{bmatrix}a\\b\end{bmatrix}\cdot\begin{bmatrix}x+2y\\y-z\end{bmatrix} = a(x+2y)+b(y-z)\]

\[\left(\mathbf{S}\begin{bmatrix}a\\b\end{bmatrix}\right)\cdot \begin{bmatrix}x\\y\\z\end{bmatrix} =\begin{bmatrix}a\\2a+b\\-b\end{bmatrix}\cdot \begin{bmatrix}x\\y\\z\end{bmatrix} = ax+(2a+b)y-bz\]

From this we deduce that \(\mathbf{S}\) is an adjoint of \(\mathbf{L}\).

Theorem 10.2. If \(\mathbf{L}:\mathcal{V}\to\mathcal{U}\) is a linear map between inner product spaces and \(\mathbf{L}^{\ast}\) is an adjoint of \(\mathbf{L}\), then

(a)

(1) \(\mathbf{L}^{\ast}\) is unique

(2) \(\mathbf{L}^{\ast}\) is linear

(3) \(\mathbf{L}\) is an adjoint of \(\mathbf{L}^{\ast}\), that is \((\mathbf{L}^{\ast})^{\ast} = \mathbf{L}\)

(4) \(\operatorname{ker}(\mathbf{L}^{\ast}\mathbf{L}) = \operatorname{ker}(\mathbf{L})\)

(5) \(\mathbf{L}(\mathcal{V})^{\bot} = \operatorname{ker}(\mathbf{L}^{\ast})\)

Recall:

Theorem 9.5 (vi) If \(\langle \mathbf{v}_{1},\mathbf{v}_{3}\rangle = \langle \mathbf{v}_{2},\mathbf{v}_{3}\rangle\) for all \(\mathbf{v}_{3}\in\mathcal{V}\), then \(\mathbf{v}_{1}=\mathbf{v}_{2}\).

Example. Let \(\mathcal{V}\) be the real inner product space of continuous functions \(f:[0,1]\to\mathbb{R}\) with inner product

\[\langle f,g\rangle = \int_{0}^{1}f(t)g(t)\,dt.\]

Consider the linear function \(\mathbf{L}:\mathcal{V}\to\mathbb{R}\) given by

\[\mathbf{L}f = \int_{0}^{1}f(t)\,dt.\]

If we give \(\mathbb{R}\) the inner product \(\langle a,b\rangle = ab\), then what is the adjoint of \(\mathbf{L}\). 

For \(f\in\mathcal{V}\) and \(a\in\mathbb{R}\) we have

\[\langle a,\mathbf{L}f\rangle = a\int_{0}^{1}f(t)\,dt = \int_{0}^{1} a f(t)\,dt\]

If we let \(\mathbf{1}:[0,1]\to\mathbb{R}\) be the constant function \(\mathbf{1}(x) = 1\) for all \(x\in[0,1]\), and we define the map \(\mathbf{S}:\mathbb{R}\to\mathcal{V}\) by \(\mathbf{S}a = a\mathbf{1}\), then we have

\[\langle \mathbf{S}a,f\rangle = \langle a\mathbf{1},f\rangle = \int_{0}^{1}a\mathbf{1}(t)f(t)\,dt = \int_{0}^{1}af(t)\,dt.\]

From this we see that \(\mathbf{S} = \mathbf{L}^{\ast}\).

Theorem 10.2

(b) Any real or complex matrix has an adjoint (with respect to the standard inner products on these spaces) namely it's conjugate transpose: if \(\mathbf{A}\in\mathbb{F}^{\mathcal{M}\times\mathcal{N}}\), then \(\langle\mathbf{A}^{\ast}\mathbf{y},\mathbf{x}\rangle = \langle\mathbf{y},\mathbf{Ax}\rangle\) for all \(\mathbf{x}\in\mathbb{F}^{\mathcal{N}}\), \(\mathbf{y}\in\mathbb{F}^{\mathcal{M}}\) where \[\mathbf{A}^{\ast}\in\mathbb{F}^{\mathcal{N}\times\mathcal{M}},\quad \mathbf{A}^{\ast}(n,m)=\overline{\mathbf{A}(m,n)}.\]

 

(c) If \(\mathbf{L},\mathbf{S}:\mathcal{V}\to\mathcal{U}\) have adjoints, and \(a,b\in\mathbb{F}\), then \(a\mathbf{L}+b\mathbf{S}\) has an adjoint, namely, \[(a\mathbf{L}+b\mathbf{S})^{\ast} = \overline{a}\mathbf{L}^{\ast}+\overline{b}\mathbf{S}^{\ast}.\]

 

(d) If \(\mathbf{L}:\mathcal{V}\to\mathcal{U}\) and \(\mathbf{K}:\mathcal{W}\to\mathcal{V}\) have adjoints, then \(\mathbf{LK}\) has an adjoint, namely, \[(\mathbf{LK})^{\ast} = \mathbf{K}^{\ast}\mathbf{L}^{\ast}.\]

Theorem 10.6. Let \(\mathcal{V}\) and \(\mathcal{U}\) be an inner product spaces \(\mathbb{F}(=\mathbb{R}\text{ or }\mathbb{C})\).

If \(\mathbf{V}:\mathbb{F}^{\mathcal{N}}\to\mathcal{V}\) is the synthesis operator of a finite sequence \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) in \(\mathcal{V}\), then \(\mathbf{V}\) has an adjoint \(\mathbf{V}^{\ast}:\mathcal{V}\to\mathbb{F}^{\mathcal{N}}\) called the analysis operator, which is given by

\[(\mathbf{V}^{\ast}\mathbf{v})(n) = \langle\mathbf{v}_{n},\mathbf{v}\rangle \quad\text{for each }\mathbf{v}\in\mathcal{V}\text{ and }n\in\mathcal{N}.\]

 

If \(\mathbf{L}:\mathcal{V}\to\mathcal{U}\) is linear and \(\mathcal{V}\) is finite dimensional, then \(\mathbf{L}\) has an adjoint.

 

If \(\mathbf{L}:\mathcal{V}\to\mathcal{U}\) is invertible and \(\mathcal{V}\) is finite dimensional, then \(\mathbf{L}^{\ast}:\mathcal{U}\to\mathcal{V}\) is invertible with \[(\mathbf{L}^{\ast})^{-1} = (\mathbf{L}^{-1})^{\ast}.\]

 

If \(\mathbf{L}:\mathcal{V}\to\mathcal{U}\) is linear and both \(\mathcal{V}\) and \(\mathcal{U}\) are finite dimensional, then \[\operatorname{rank}(\mathbf{L}) = \operatorname{rank}(\mathbf{L}^{\ast}) = \operatorname{rank}(\mathbf{L}\mathbf{L}^{\ast}) = \operatorname{rank}(\mathbf{L}^{\ast}\mathbf{L}).\]

(a)

(b)

(c)

(d)

Part 12

Orthonormal bases

(See Section 11 in the notes)

Definition 11.1. Let \(\mathcal{V}\) be an inner product space over \(\mathbb{F}\).

(a) A linear function \(\mathbf{L}\) is an isometry if it has an adjoint \(\mathbf{L}^{\ast}\) and \(\mathbf{L}^{\ast}\mathbf{L} = \mathbf{I}.\)

(b) A linear function \(\mathbf{L}\) is unitary if it has an adjoint \(\mathbf{L}^{\ast}\) and \(\mathbf{L}^{-1}=\mathbf{L}^{\ast}.\)

(c) A finite sequence \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) of vectors in \(\mathcal{V}\) is orthonormal if \(\langle\mathbf{v}_{n_{1}},\mathbf{v}_{n_{2}}\rangle = \begin{cases} 1 & n_{1}=n_{2},\\ 0 & n_{1}\neq n_{2}.\end{cases}\)

(d) A finite sequence \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) of vectors in \(\mathcal{V}\) is an orthonormal basis for \(\mathcal{V}\) if it is orthonormal and is also a basis for \(\mathcal{V}\).

(d') If \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is a basis for \(\mathcal{V}\) and \(\{\mathbf{v}_{n}/\|\mathbf{v}_{n}\|\}_{n\in\mathcal{N}}\) is orthonormal, then we say that \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is an orthogonal basis.

Examples.

  1. The identity operator on a finite dimensional inner product space is a unitary.
  2. The standard basis \(\{\boldsymbol{\delta}_{n}\}_{n\in \mathcal{N}}\) is an orthonormal basis for \(\mathbb{F}^{\mathcal{N}}\) with the standard inner product.

Example.

3. Consider \(\mathbb{P}_{2}\) as in the homework. Recall that the inner product is given by \[\langle f,g\rangle = \int_{-1}^{1}f(x)g(x)\,dx.\] The sequence \(\{f_{0},f_{1},f_{2}\}\) where \[f_{0}(x) = x^2,\quad f_{1}(x) = x,\quad f_{2}(x) = 1-\frac{5}{3}x^2\] is an orthogonal basis for \(\mathbb{P}_{2}\). It is obtained by applying the Gram-Schmidt algorithm to \(p_{2},p_{1},p_{0}\), in that order.

If we instead had the inner product given by \[\langle f,g\rangle = \int_{0}^{1}f(x)g(x)\,dx.\] Then the above sequence is not an orthogonal basis. One orthogonal basis would be \(\{g_{0},g_{1},g_{2}\}\) given by

\[g_{0}(x) = 1,\quad g_{1}(x) = x-\frac{1}{2},\quad g_{2}(x) =\frac{1}{6}-x+x^2.\]

Examples. 4. The map \(\mathbf{L}:\mathbb{R}^{2}\to\mathbb{R}^{3}\) with matrix representation \[\mathbf{L} = \begin{bmatrix} \frac{\sqrt{2}}{2} & \frac{\sqrt{3}}{3}\\[1ex] -\frac{\sqrt{2}}{2} & \frac{\sqrt{3}}{3}\\[1ex] 0 & \frac{\sqrt{3}}{3}\end{bmatrix}\] is an isometry, since \[\mathbf{L}^{\ast}\mathbf{L} = \begin{bmatrix} \frac{\sqrt{2}}{2} & -\frac{\sqrt{2}}{2} & 0\\[1ex] \frac{\sqrt{3}}{3} & \frac{\sqrt{3}}{3} & \frac{\sqrt{3}}{3}\end{bmatrix}\begin{bmatrix} \frac{\sqrt{2}}{2} & \frac{\sqrt{3}}{3}\\[1ex] -\frac{\sqrt{2}}{2} & \frac{\sqrt{3}}{3}\\[1ex] 0 & \frac{\sqrt{3}}{3}\end{bmatrix} = \begin{bmatrix}1 & 0\\ 0 & 1\end{bmatrix}.\] But the observation that it is not invertible implies that it is not unitary.

Note that the matrix \[\mathbf{S} = \begin{bmatrix}\frac{\sqrt{2}}{2} & \frac{\sqrt{3}}{3} & -\frac{\sqrt{6}}{6}\\[1ex] -\frac{\sqrt{2}}{2} & \frac{\sqrt{3}}{3} & -\frac{\sqrt{6}}{6}\\[1ex] 0 & \frac{\sqrt{3}}{3} & \frac{\sqrt{6}}{3}\end{bmatrix}\]

(obtained by adding one column to \(\mathbf{L}\)) is unitary.

Theorem 11.2. Let \(\mathcal{V}\) be an inner product space over \(\mathbb{F}\).

 

(a) A linear function \(\mathbf{L}\) is an isometry if and only if \(\|\mathbf{Lv}\| = \|\mathbf{v}\|\) for all \(\mathbf{v}\in\mathcal{V}\). 

 

Let \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) be a finite sequence in \(\mathcal{V}\) and let \(\mathbf{V}\) be the synthesis operator of \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\).

 

(b) The following are equivalent:

(i) \(\mathbf{V}\) is an isometry

(ii) \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is orthonormal

(iii) \(\displaystyle{\left\|\sum_{n\in\mathcal{N}}\mathbf{x}(n)\mathbf{v}_{n}\right\|^{2} = \sum_{n\in\mathcal{N}}|\mathbf{x}(n)|^{2}}\) for all \(\mathbf{x}\in\mathbb{F}^{\mathcal{N}}.\) (Pythagorean theorem)

Moreover, when this occurs, \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is linearly independent.

Theorem 11.2. Let \(\mathcal{V}\) be an inner product space over \(\mathbb{F}\).

(c) The following are equivalent:

(i) \(\mathbf{V}^{\ast}\) is an isometry

(ii) \(\displaystyle{\mathbf{v} = \sum_{n\in\mathcal{N}}\langle \mathbf{v}_{n},\mathbf{v}\rangle \mathbf{v}_{n}}\) for all \(\mathbf{v}\in\mathcal{V}\).

(iii) \(\displaystyle{\|\mathbf{v}\|^{2} = \sum_{n\in\mathcal{N}} |\langle\mathbf{v}_{n},\mathbf{v}\rangle|^{2}}\) for all \(\mathbf{v}\in\mathcal{V}\) (Parseval's identity).

Moreover, when this occurs, \(\operatorname{span}\{\mathbf{v}_{n}\}_{n\in\mathcal{N}} = \mathcal{V}\).

(d) The following are equivalent:

(i) \(\mathbf{V}\) is unitary

(ii) \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is an orthonormal basis.

(iii) \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is orthonormal and \(\#(\mathcal{N}) = \operatorname{dim}\mathcal{V}\).

Also, if \(\mathcal{V}\) is finite dimensional:

(e) If \(\mathcal{U}\) is a subspace of \(\mathcal{V}\) then \(\operatorname{dim}\mathcal{U}+\operatorname{dim}\mathcal{U}^{\bot}=\operatorname{dim}\mathcal{V}\) and \((\mathcal{U}^{\bot})^{\bot} = \mathcal{U}\).

(f) If \(\mathcal{V}\) is nontrivial, then an orthonormal basis for \(\mathcal{V}\) exists.

We showed that for \(\mathbf{x}\in\mathbb{F}^{\mathcal{N}}\)

\[(\mathbf{V}^{\ast}\mathbf{Vx})(m) = \sum_{n\in\mathcal{N}}\mathbf{x}(n)\langle \mathbf{v}_{m},\mathbf{v}_{n}\rangle\]

Proof of (c). Note that for \(\mathbf{v}\in\mathcal{V}\) we have

\[\mathbf{VV}^{\ast}\mathbf{v} = \sum_{n\in\mathcal{N}}\langle \mathbf{v}_{n},\mathbf{v}\rangle \mathbf{v}_{n}.\]

If \(\mathbf{V}^{\ast}\) is an isometry, then \(\mathbf{VV}^{\ast} = \mathbf{I}\), that is

\[\mathbf{v} = \mathbf{Iv} = \mathbf{VV}^{\ast}\mathbf{v} = \sum_{n\in\mathcal{N}}\langle \mathbf{v}_{n},\mathbf{v}\rangle \mathbf{v}_{n}.\]

Similarly, if \(\mathbf{v}=\sum_{n\in\mathcal{N}}\langle \mathbf{v}_{n},\mathbf{v}\rangle \mathbf{v}_{n}\), then using (\(\ast\)) we see that \(\mathbf{VV}^{\ast} = \mathbf{I}\). Thus (i) and (ii) are equivalent. 

Next, note that since \(\{\boldsymbol{\delta}_{n}\}_{n\in\mathcal{N}}\) is a orthonormal sequence in \(\mathbb{F}^{\mathcal{N}}\) we have

\[\|\mathbf{V}^{\ast}\mathbf{v}\|^{2} = \left\|\sum_{n\in\mathcal{N}}\langle \mathbf{v}_{n},\mathbf{v}\rangle\boldsymbol{\delta}_{n}\right\|^{2} =  \sum_{n\in\mathcal{N}}|\langle \mathbf{v}_{n},\mathbf{v}\rangle|^{2}.\]

From this, and part (a) we see that \(\mathbf{V}^{\ast}\) is an isometry if and only if Parseval's identity holds, that is, (i) and (iii) are equivalent.

 

Finally, part (ii) clearly implies \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) spans \(\mathcal{V}\).

(\(\ast\))

(d) (i)\(\Rightarrow\)(ii) Suppose \(\mathbf{V}\) is unitary, which implies both \(\mathbf{V}\) and \(\mathbf{V}^{\ast}\) are isometries. By part (b) the sequence \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is orthonormal and independent, and by part (c) it spans \(\mathcal{V}\), thus it is an orthonormal basis for \(\mathcal{V}\).

 

(ii)\(\Rightarrow\)(iii) Suppose \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is an orthonormal basis for \(\mathcal{V}\). This clearly implies that \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is orthonormal and \(\#(\mathcal{N}) = \operatorname{dim}(\mathcal{V})\).

 

(iii)\(\Rightarrow\)(i) \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is orthonormal and \(\#(\mathcal{N}) = \operatorname{dim}(\mathcal{V})\). By part (b) the operator \(\mathbf{V}\) is an isometry, and the sequence \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is linearly independent. We deduce that \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is an orthonormal basis, and hence there exists \(\mathbf{x}\in\mathbb{F}^{\mathcal{N}}\) such that \(\mathbf{Vx} = \mathbf{v}\). This shows that \(\mathbf{V}\) is onto, and thus invertible. Using the fact that \(\mathbf{V}\) is an isometry, we have

\[\mathbf{V}^{-1} = \mathbf{I}\mathbf{V}^{-1}= \mathbf{V}^{\ast}\mathbf{V}\mathbf{V}^{-1} = \mathbf{V}^{\ast},\]

which shows that \(\mathbf{V}\) is unitary.

 

Examples. Consider the sequence of vectors \(\{\mathbf{h}_{n}\}_{n=1}^{4}\) given by

\[\mathbf{h}_{1} = \frac{1}{2}\left[\begin{array}{r} 1\\1\\1\\1\end{array}\right],\ \mathbf{h}_{2} = \frac{1}{2}\left[\begin{array}{r} 1\\-1\\1\\-1\end{array}\right],\ \mathbf{h}_{3} = \frac{1}{2}\left[\begin{array}{r} 1\\1\\-1\\-1\end{array}\right],\ \mathbf{h}_{4} = \frac{1}{2}\left[\begin{array}{r} 1\\-1\\-1\\1\end{array}\right]\]

One can check that this is an orthonormal basis in \(\mathbb{R}^{4}\) (with the dot product)

So, for \(\mathbf{x} = \left[\begin{array}{r} 1\\2\\-3\\0\end{array}\right]\) we know that there is a unique sequence of numbers such that

\[\mathbf{x} = a_{1}\mathbf{h}_{1} + a_{2}\mathbf{h}_{2} + a_{3}\mathbf{h}_{3} + a_{4}\mathbf{h}_{4} \]

But solving for these numbers seems tedious!

By (c) we have

\[\mathbf{x} = (\mathbf{x}\cdot \mathbf{h}_{1})\mathbf{h}_{1} + (\mathbf{x}\cdot \mathbf{h}_{2})\mathbf{h}_{2} + (\mathbf{x}\cdot \mathbf{h}_{3})\mathbf{h}_{3} + (\mathbf{x}\cdot \mathbf{h}_{4})\mathbf{h}_{4}\]

\[ = 0\mathbf{h}_{1} + (-2)\mathbf{h}_{2} + 3\mathbf{h}_{3} + 1\mathbf{h}_{4}\]

Part 13

Least Squares

(See Section 12 in the notes)

Definition 13.1 (d) An orthogonal projection is a linear map \(\mathbf{P}:\mathcal{V}\to\mathcal{V}\) on an inner product space \(\mathcal{V}\) such that \(\mathbf{P}\) has an adjoint and \(\mathbf{P}^{\ast} = \mathbf{P} = \mathbf{P}^{2}\). (Recall that \(\mathbf{P}^{2}\) is notation for the composition \(\mathbf{PP}.\)).

Theorem (The Projection theorem, see HW9). If \(\mathcal{V}\) is a finite-dimensional inner-product space, and \(\mathcal{W}\subset\mathcal{V}\) is a subspace, then there exists a unique orthogonal projection \(\mathbf{P}:\mathcal{V}\to\mathcal{V}\) such that \(\mathbf{P}(\mathcal{V})=\mathcal{W},\) and \((\mathbf{I}-\mathbf{P})(\mathcal{V}) =\operatorname{ker}(\mathbf{P})  =\mathcal{W}^{\bot}.\) Moreover, if \(\mathbf{v}_{0}\in\mathcal{V}\), then \(\mathbf{Pv}_{0}\) is the unique closest point in \(\mathcal{W}\) to \(\mathbf{v}_{0}\), that is, \(\mathbf{Pv}_{0}\) is the only point in \(\mathcal{W}\) which satisfies \[\|\mathbf{v}_{0}-\mathbf{Pv}_{0}\|\leq \|\mathbf{v}_{0}-\mathbf{w}\|\quad\text{for all }\mathbf{w}\in\mathcal{W}.\]

Proof of the "Moreover" part. Note that \(\mathbf{v}_{0} = \mathbf{Pv}_{0}+(\mathbf{v}_{0}-\mathbf{Pv}_{0})\), where \(\mathbf{Pv}_{0}\in\mathcal{W}\), and \(\mathbf{v}_{0}-\mathbf{Pv}_{0}\in\mathcal{W}^{\bot}\). For \(\mathbf{w}\in\mathcal{W}\), by the Pythagorean theorem

\[\|\mathbf{v}_{0}-\mathbf{w}\|^2 = \|(\mathbf{Pv}_{0}-\mathbf{w})+(\mathbf{v}_{0}-\mathbf{Pv}_{0})\|^2= \|\mathbf{Pv}_{0}-\mathbf{w}\|^{2}+\|\mathbf{v}_{0}-\mathbf{Pv}_{0}\|^2\]

\[\geq \|\mathbf{v}_{0}-\mathbf{Pv}_{0}\|^2\]

If \(\mathbf{w}_{0}\in\mathcal{W}\) satisfies \(\|\mathbf{v}_{0}-\mathbf{w}_{0}\|\leq \|\mathbf{v}_{0}-\mathbf{w}\|\) for all  \(\mathbf{w}\in\mathcal{W},\) then the above inequality shows that \(\|\mathbf{Pv}_{0}-\mathbf{w}_{0}\|=0\). \(\Box\)

Theorem 12.2. Let \(\mathcal{V}\) and \(\mathcal{U}\) be inner product spaces with \(\mathcal{V}\) finite dimensional, and let \(\mathbf{L}:\mathcal{V}\to\mathcal{U}\) be linear.

 

(a) For any \(\mathbf{u}\in\mathcal{U}\), the quantity \(\|\mathbf{Lv}-\mathbf{u}\|\) has a minimum over all \(\mathbf{v}\in\mathcal{V}\). Moreover, for any \(\mathbf{v}\in\mathcal{V}\), the quantity \(\|\mathbf{Lv}-\mathbf{u}\|\) is minimal if and only if \(\mathbf{L}^{\ast}\mathbf{Lv}=\mathbf{L}^{\ast}\mathbf{u}.\)

(Such \(\mathbf{v}\) are called least-squares solutions to \(\mathbf{Lv}=\mathbf{u}.\))

 

(b) \(\mathbf{L}^{\ast}\mathbf{L}\) is invertible if and only if \(\operatorname{rank}(\mathbf{L}) = \operatorname{dim}(\mathcal{V})\). When this occurs, for any \(\mathbf{u}\in\mathcal{U}\), the equation \(\mathbf{Lv}=\mathbf{u}\) has a unique least-squares solution, namely \(\mathbf{v} = (\mathbf{L}^{\ast}\mathbf{L})^{-1}\mathbf{L}^{\ast}\mathbf{u}\). Otherwise, \(\mathbf{Lv}=\mathbf{u}\) has infinitely many least squares solutions.

Least Squares 

Example 1.  Consider the following linear inverse problem: Solve

\[\begin{bmatrix} 1 & 2 & 1\\1 & 1 & 0\\0&0&0\end{bmatrix}\begin{bmatrix}x\\y\\z\end{bmatrix} = \begin{bmatrix}1\\2\\-1\end{bmatrix}.\]

It is easy to see that this equation has no solution. However, if we have an inner product (such as the dot product) then we have a notion of distance. Thus we might ask to find \(x,y,z\in\mathbb{R}\) such that

\[\left\|\begin{bmatrix} 1 & 2 & 1\\1 & 1 & 0\\0&0&0\end{bmatrix}\begin{bmatrix}x\\y\\z\end{bmatrix} - \begin{bmatrix}1\\2\\-1\end{bmatrix}\right\|\] is as small as possible.

Example continued.  Set 

\[\mathbf{L} = \begin{bmatrix} 1 & 2 & 1\\1 & 1 & 0\\0&0&0\end{bmatrix}\quad\text{and}\quad \mathbf{y} = \begin{bmatrix}1\\2\\-1\end{bmatrix}.\]

We wish to find \(\mathbf{x}\in\mathbb{R}^{3}\) such that \(\|\mathbf{Lx}-\mathbf{y}\|\) is as small as possible. By Theorem 12.2, we can solve \(\mathbf{L}^{\ast}\mathbf{Lx}=\mathbf{L}^{\ast}\mathbf{y}\):

\[\mathbf{L}^{\ast}\mathbf{Lx} = \begin{bmatrix} 1 & 1 & 0\\2 & 1 & 0\\1&0&0\end{bmatrix}\begin{bmatrix} 1 & 2 & 1\\1 & 1 & 0\\0&0&0\end{bmatrix}\begin{bmatrix}x\\y\\z\end{bmatrix} = \begin{bmatrix}2 & 3 & 1\\ 3 & 5 & 2\\ 1&2&1\end{bmatrix}\begin{bmatrix}x\\y\\z\end{bmatrix}\quad\text{and}\quad \mathbf{L}^{\ast}\mathbf{y} = \begin{bmatrix}3\\4\\1\end{bmatrix}\]

The solutions are the vectors

\[\mathbf{x} = \begin{bmatrix}3\\-1\\0\end{bmatrix} + c\begin{bmatrix}1\\-1\\1\end{bmatrix}\]

for any \(c\in\mathbb{R}\).

Example. Consider the following linear inverse problem: Find \(a,b,c\in\mathbb{R}\) such that

\[a+bx+cx^2 = \cos(2\pi x)\quad\text{for all }x\in[-1,1].\]

Evaluating this at \(x=-1,0,1,\) and \(1/2\) we see that \(a,b,\) and \(c\) would necessarily satisfy

\[\left\{\begin{array}{ccccl} a & -b & +c & = & 1\\ a&&&=&1\\ a&+b&+c&=&1\\ a&+\frac{1}{2}b&+\frac{1}{4}c&=&-1\end{array}\right.\]

 

The first three equations imply \(a=1,b=0,\) and \(c=0\), however

\[1+\tfrac{1}{2}(0)+\tfrac{1}{4}(0)\neq-1,\]

and hence this linear inverse problem has no solution.

Instead, consider the inner product space \(\mathcal{V} = \operatorname{span}\{p_{0},p_{1},p_{2},q\}\subset\mathbb{R}^{[-1,1]}\) where

\[q(x) = \cos(2\pi x),\quad\text{and}\quad p_{n}(x) = x^{n},\quad\text{for }x\in[-1,1],\ n\in\{0,1,2\},\]

and

\[\langle f,g\rangle = \int_{-1}^{1}f(x)g(x)\,dx.\]

Recall that \(\mathbb{P}_{2}=\operatorname{span}\{p_{0},p_{1},p_{2}\}\), then the linear inverse problem asks us to show that \(q\) is in \(\mathbb{P}_{2}\), which it is not!

 

We showed on the previous slide that \(q\) is not a linear combination of \(p_{0},p_{1},\) and \(p_{2}\). 

Example. Consider the following linear inverse problem: Find \(a,b,c\in\mathbb{R}\) such that

\[a+bx+cx^2 = \cos(2\pi x)\quad\text{for all }x\in[-1,1].\]

Instead, we can ask for the closest point to \(q\) in \(\mathbb{P}_{2}\), that is, we want to find \(f_{0}\in\mathbb{P}_{2}\) such that

\[\|f_{0}-q\|\leq \|f-q\|\quad\text{for all }f\in\mathbb{P}_{2}.\]

Reformulating this, we wish to find \(f_{0}\in\mathbb{P}_{2}\) where \(f_{0}(x) = a+bx+cx^{2}\) and

\[\int_{-1}^{1}\big(a+bx+cx^{2}-\cos(2\pi x)\big)^{2}\,dx\leq \int_{-1}^{1}\big(\alpha +\beta x+\gamma x^{2}-\cos(2\pi x)\big)^{2}\,dx\]

for all \(\alpha,\beta,\gamma\in\mathbb{R}.\)

In Homework 8 we found that \(q_{0},q_{1},\) and \(q_{2}\) given by

\[q_{0}(x) = \sqrt{\frac{1}{2}},\quad q_{1}(x)=\sqrt{\frac{3}{2}}x,\quad q_{2}(x) = \sqrt{\frac{45}{8}}\left(x^{2}-\frac{1}{3}\right)\]

is an orthogonal basis for \(\mathbb{P}_{2}\).

By the Projection theorem, if \(\mathbf{Q}:\mathcal{V}\to\mathcal{V}\) is the orthogonal projection onto \(\mathbb{P}_{2}\), then \(f_{0}=\mathbf{Q}q\) is the desired function.

\[q_{0}(x) = 1,\quad q_{1}(x)=x,\quad q_{2}(x) = x^{2}-\frac{1}{3}\]

is an orthonormal basis for \(\mathbb{P}_{2}\).

From Homework 9, Problem 1(d), if \(\mathbf{Q}:\mathcal{V}\to\mathcal{V}\) is the orthogonal projection onto \(\mathbb{P}_{2}\) then\[\mathbf{Q}f = \langle q_{0},f\rangle q_{0}+\langle q_{1},f\rangle q_{1}+\langle q_{2},f\rangle q_{2} \quad\text{for }f\in\mathcal{V}.\]

\[\langle q_{0},q\rangle = \sqrt{\frac{1}{2}}\int_{-1}^{1}\cos(2\pi x)\,dx = 0.\]

\[\langle q_{1},q\rangle = \sqrt{\frac{3}{2}}\int_{-1}^{1}x\cos(2\pi x)\,dx = 0.\]

\[\langle q_{2},q\rangle = \sqrt{\frac{45}{8}}\int_{-1}^{1}\left(x^{2}-\frac{1}{3}\right)\cos(2\pi x)\,dx =\frac{3}{2\pi^2}\sqrt{\frac{5}{2}}\approx 0.24030.\]

Hence, we see that \(f_{0}=\mathbf{Q}q\) is the closest point in \(\mathbb{P}_{2}\). Simplifying, we obtain \[f_{0}(x)=\frac{3}{2\pi^2}\sqrt{\frac{5}{2}}\left[\sqrt{\frac{45}{8}}\left(x^{2}-\frac{1}{3}\right)\right] = -\frac{15}{8\pi^2} + \frac{45}{8\pi^2} x^2\]

\[f_{0}(x)\approx -0.189977 + 0.569932x^2\]

\[\|f_{0}-q\|^{2} = \int_{-1}^{1}\left(-\frac{15}{8\pi^2} + \frac{45}{8\pi^2} x^2 - \cos(2\pi x)\right)^{2}\,dx\approx0.942254\]

\(\Delta x\)

\(q(x)-f_{0}(x)\)

We're adding up \((q(x)-f_{0}(x))^{2}\Delta x\), not the areas of the rectangles

Function: \(q(x) = \cos(2\pi x)\)

Least squares quadratic on \([-1,1]\): \(f_{0}(x) = -\dfrac{15}{8\pi^2} + \dfrac{45}{8\pi^2} x^2\)

Quadratic Taylor polynomial at \(0\): \(T(x) = 1-2\pi^{2}x^{2}\)

Function: \(q(x) = \cos(2\pi x)\)

Least squares quadratic on \([-1,1]\): \(f_{0}(x) = -\dfrac{15}{8\pi^2} + \dfrac{45}{8\pi^2} x^2\)

Quadratic Taylor polynomial at \(0\): \(T(x) = 1-2\pi^{2}x^{2}\)

Example Redux. Let \(\mathbf{L}:\mathbb{R}^{3}\to\mathcal{V}\) be the synthesis operator of \(\{p_{0},p_{1},p_{2}\}\), that is \[\mathbf{L}\mathbf{x} = \mathbf{x}(1)p_{0}+\mathbf{x}(2)p_{1}+\mathbf{x}(3)p_{2}.\] Our linear inverse problem is to find \(\mathbf{x}\in\mathbb{R}^{3}\) such that \(\mathbf{Lx} = q\). Since this equation has no solution, we instead try to find \(\mathbf{x}\) such that \(\|\mathbf{Lx}-q\|\) is minimal. By Theorem 12.2 these are exactly the solutions to \(\mathbf{L}^{\ast}\mathbf{Lx} = \mathbf{L}^{\ast}q\). Moreover, we know that \(\mathbf{L}^{\ast}:\mathcal{V}\to\mathbb{R}^{3}\) is the analysis operator of \(\{p_{0},p_{1},p_{2}\}\).

Note that \(\mathbf{L}^{\ast}\mathbf{L}:\mathbb{R}^{3}\to\mathbb{R}^{3}\), hence the operator \(\mathbf{L}^{\ast}\mathbf{L}\) has a standard matrix representation: \[\mathbf{L}^{\ast}\mathbf{L} = \begin{bmatrix} 2 & 0 & \frac{2}{3}\\ 0 & \frac{2}{3} & 0\\\frac{2}{3} & 0 & \frac{2}{5}\end{bmatrix},\]

hence the equation \(\mathbf{L}^{\ast}\mathbf{Lx} = \mathbf{L}^{\ast}q\) can be written as the matrix equation:

\[\begin{bmatrix} 2 & 0 & \frac{2}{3}\\ 0 & \frac{2}{3} & 0\\\frac{2}{3} & 0 & \frac{2}{5}\end{bmatrix}\begin{bmatrix}\mathbf{x}(1)\\\mathbf{x}(2)\\\mathbf{x}(3)\end{bmatrix} = \mathbf{L}^{\ast}\mathbf{L}\mathbf{x} = \mathbf{L}^{\ast}q= \begin{bmatrix} \langle p_{0},q\rangle\\\langle p_{1},q\rangle\\ \langle p_{2},q\rangle \end{bmatrix} = \begin{bmatrix} 0\\ 0\\ \pi^{-2}\end{bmatrix}.\]

The equation 

\[\begin{bmatrix} 2 & 0 & \frac{2}{3}\\ 0 & \frac{2}{3} & 0\\\frac{2}{3} & 0 & \frac{2}{5}\end{bmatrix}\begin{bmatrix}\mathbf{x}(1)\\\mathbf{x}(2)\\\mathbf{x}(3)\end{bmatrix}  = \begin{bmatrix} 0\\ 0\\ \pi^{-2}\end{bmatrix}.\]

Has a unique solution 

\[\mathbf{x} = \frac{15}{8\pi^{2}}\begin{bmatrix}-1\\0\\3\end{bmatrix}\] and thus \[\mathbf{L}\mathbf{x} =\frac{15}{8\pi^{2}}(-p_{0}+3p_{2})\]

is the closest point in \(\mathbb{P}_{2}\) to the function \(q\), just as we found before.

We can generalize! Consider the vector space \[\mathcal{V}_{N} = \operatorname{span}\{p_{0},p_{1},\ldots,p_{N},q\}\subset \mathbb{R}^{[-1,1]}.\]

We wish to find \(f_{0}\in\mathbb{P}_{N}\) such that \(\|f_{0}-q\|\) is minimal.

 

If we let \(\mathbf{L}:\mathbb{R}^{N+1}\to\mathcal{V}_{N}\) be the synthesis operator of \(\{p_{0},p_{1},\ldots,p_{N}\}\), then we wish to find \(\mathbf{x}\) such that \(\|\mathbf{Lx}-q\|\) is minimal. By Theorem 12.2 this is equivalent to solving \(\mathbf{L}^{\ast}\mathbf{Lx} = \mathbf{L}^{\ast}q\).

This has a unique solution \(\mathbf{x}\), and hence \(f_{0} = \mathbf{Lx}\) is the desired minimizer, and 

\[f_{0}(x) \approx 0.4375-5.7053x^2+7.3211x^4\]

When \(N=4\) this becomes

\[\begin{bmatrix} 2 & 0 & \frac{2}{3} & 0 & \frac{2}{5}\\ 0 & \frac{2}{3} & 0 & \frac{2}{5} & 0\\\frac{2}{3} & 0 & \frac{2}{5} & 0 & \frac{2}{7}\\ 0 & \frac{2}{5} & 0 & \frac{2}{7} & 0\\ \frac{2}{5} & 0 & \frac{2}{7} & 0 & \frac{2}{9}\end{bmatrix}\begin{bmatrix}\mathbf{x}(1)\\\mathbf{x}(2)\\\mathbf{x}(3)\\\mathbf{x}(4)\\\mathbf{x}(5)\end{bmatrix} = \begin{bmatrix} 0\\0\\1/\pi^{2}\\0\\(2\pi^{2}-3)/\pi^{4}\end{bmatrix}\]

Least Squares

Taylor at \(0\)

4th degree

6th degree

4th degree

6th degree

Part 14

The Spectral Theorem

(See Section 13 in the notes)

Definition 11.5. Suppose \(\mathcal{V}\) is an inner product space, and \(\mathbf{L}:\mathcal{V}\to\mathcal{V}\) is a linear operator that has an adjoint. If \(\mathbf{L}\) commutes with its adjoint, that is \(\mathbf{LL}^{\ast}=\mathbf{L}^{\ast}\mathbf{L}\), then we say that \(\mathbf{L}\) is normal.

Note: One example of a normal operator is a unitary operator with the same domain and codomain.

Theorem 11.6. Suppose \(\mathcal{V}\) is an inner product space over \(\mathbb{F}\), and \(\mathbf{L}:\mathcal{V}\to\mathcal{V}\) is a linear operator.

(a) If \(\mathbf{L}\) is normal, and \(\mathbf{Lv}=\lambda\mathbf{v}\) for some \(\lambda\in\mathbb{F}\), \(\mathbf{v}\in\mathcal{V}\), then \(\mathbf{L}^{\ast}\mathbf{v} = \overline{\lambda}\mathbf{v}\).

(b) If \(\mathbf{L}\) is normal, \(\mathbf{Lv}_{1}=\lambda_{1}\mathbf{v}_{1}\), and \(\mathbf{Lv}_{2}=\lambda_{2}\mathbf{v}_{2}\) for some \(\lambda_{1},\lambda_{2}\in\mathbb{F}\), \(\mathbf{v}_{1},\mathbf{v}_{2}\in\mathcal{V}\) and \(\lambda_{1}\neq \lambda_{2}\), then \(\langle\mathbf{v}_{1},\mathbf{v}_{2}\rangle = 0\).

Examples (See Example 11.7). Let \(\mathbb{Z}_{N}\) be the set \(\{0,1,2,\ldots,N-1\}\) with addition modulo \(N\). Let \(\mathbf{T}:\mathbb{C}^{\mathbb{Z}_{N}}\to\mathbb{C}^{\mathbb{Z}_{N}}\) be given by \[(\mathbf{Tx})(n) = \mathbf{x}(n-1).\]

If \(\mathbf{1}\in\mathbb{C}^{\mathbb{Z}_{N}}\) is the all-ones vector, that is, \(\mathbf{1}(n)=1\) for all \(n\in\mathbb{Z}_{N}\), then \(\mathbf{T}\mathbf{1} = \mathbf{1}\).

More generally, for\(n,m\in\mathbb{Z}_{N}\) set 

\[\mathbf{e}_{n}(m) = e^{2\pi i n m/N} = \cos\left(\frac{2\pi nm}{N}\right)+i\sin\left(\frac{2\pi nm}{N}\right).\] Note that \(\mathbf{e}_{0} = \mathbf{1}.\) Then

 

\[(\mathbf{Te}_{n})(m) =  e^{2\pi i n (m-1)/N} = e^{\frac{2\pi i n m}{N}-\frac{2\pi i n}{N}} = e^{\frac{2\pi i n}{N}}e^{\frac{2\pi i n m}{N}} = e^{\frac{2\pi i n}{N}}\mathbf{e}_{n}(m).\]

 

\[\mathbf{Te}_{n} = e^{\frac{2\pi i n}{N}}\mathbf{e}_{n}.\]

Therefore

Since \(\mathbf{T}\) is normal, and each of these eigenvalues is distinct, the previous theorem shows that \(\{\mathbf{e}_{n}\}_{n=0}^{N-1}\) is an orthogonal sequence in \(\mathbb{C}^{\mathbb{Z}_{N}}\). They're not normal, but if we normalize them, then they would form an orthonormal eigenbasis for \(\mathbf{T}\).

Spectral decomposition theorem. If \(\mathbf{L}:\mathcal{V}\to\mathcal{V}\) is a nontrivial finite-dimensional complex inner product space, then the following are equivalent:

(i) \(\mathbf{L}\) is normal.

(ii) \(\mathbf{L}\) has an orthonormal eigenbasis, that is, an orthonormal basis consisting of eigenvectors of \(\mathbf{L}\).

(iii) \(\mathbf{L}\) is unitarily diagonalizable, that is, there exists a unitary operator \(\mathbf{V}:\mathbb{C}^{\mathcal{N}}\to\mathcal{V}\) and a diagonal matrix \(\mathbf{\Lambda}\in\mathbb{C}^{\mathcal{N}\times\mathcal{N}}\) such that \(\mathbf{L} = \mathbf{V\Lambda V}^{\ast}\).

Moreover, if \(\{\mathbf{v}_{n}\}_{n\in\mathcal{N}}\) is an orthonormal eigenbasis for \(\mathbf{L}\) and \(\{\lambda_{n}\}_{n\in\mathcal{N}}\) are the associated eigenvalues, then for any \(\mathbf{v}\in\mathcal{V}\) we have

\[\mathbf{Lv} = \sum_{n\in\mathcal{N}}\lambda_{n}\langle \mathbf{v}_{n},\mathbf{v}\rangle \mathbf{v}_{n}.\]

If we have the unitary diagonalization as in (iii), then \(\mathbf{V}\) is the synthesis operator of the sequence \(\{\mathbf{V}\boldsymbol{\delta}_{n}\}_{n\in\mathcal{N}}\), which is an orthonormal eigenbasis of \(\mathbf{L}\) satisfying \[\mathbf{L}(\mathbf{V}\boldsymbol{\delta}_{n}) = \mathbf{\Lambda}(n,n)\mathbf{V}\boldsymbol{\delta}_{n}.\]

Definition 13.1. Let \(\mathbf{L}:\mathcal{V}\rightarrow\mathcal{V}\) be linear where \(\mathcal{V}\) is an inner product space.
(a) \(\mathbf{L}\) is self-adjoint if it has an adjoint and \(\mathbf{L}^*=\mathbf{L}.\)

(b) \(\mathbf{L}\) is positive semidefinite if \(\mathbf{L}\) is self-adjoint and \(\langle\mathbf{v},\mathbf{L}\mathbf{v}\rangle\geq0\) for all \(\mathbf{v}\in\mathcal{V}.\)

(c) \(\mathbf{L}\) is positive definite if \(\mathbf{L}\) is self-adjoint and \(\langle\mathbf{v},\mathbf{L}\mathbf{v}\rangle>0\) for all \(\mathbf{v}\in\mathcal{V},\) \(\mathbf{v}\neq\mathbf{0}.\)

(d) \(\mathbf{L}\) is an orthogonal projection operator if it is self-adjoint and \(\mathbf{L}^2=\mathbf{L}.\)

Theorem 13.2  Let \(\mathbf{L}:\mathcal{V}\rightarrow\mathcal{V}\) where \(\mathcal{V}\) be linear where \(\mathcal{V}\) is an inner product space.

(d) If \(\mathbf{L}\) is an orthogonal projection operator then it is normal and any eigenvalue \(\lambda\) of it is either \(0\) or \(1.\)

(e) If \(\mathbf{L}\) is unitary then it is normal and every eigenvalue \(\lambda\) of it is unimodular, that is, \(|\lambda|=1.\)
Moreover, if \(\mathcal{V}\) is complex and finite-dimensional, the converse of each of the above statements is also true.

(a) If \(\mathbf{L}\) is self-adjoint then it is normal and every eigenvalue \(\lambda\) of it is real, that is, \(\lambda\in\mathbb{R}.\)

(b) If \(\mathbf{L}\) is positive semidefinite then it is normal and every eigenvalue \(\lambda\) of it is nonnegative, that is, \(\lambda\geq0.\)

(c) If \(\mathbf{L}\) is positive definite then it is normal and any eigenvalue \(\lambda\) of it is positive, that is, \(\lambda>0.\)

Example. Let \(\displaystyle{\mathbf{L} = \begin{bmatrix} 3 & -1\\ -1 & 2\end{bmatrix}.}\)

\(\mathbf{L}(\text{blue vector}) = \text{red vector}\)

\(\text{blue vector}\)

\(\text{red vector}\)

The blue vectors go through all vectors of length \(1\).

Example continued. Let \(\displaystyle{\mathbf{L} = \begin{bmatrix} 2 & -1\\ 2 & 1\end{bmatrix}.}\)

\(\mathbf{L}(\text{blue vector}) = \text{red vector}\)

\(\text{blue vector}\)

\(\text{red vector}\)

The blue vectors go through all vectors of length \(1\).

Theorem (Singular Value Decomposition). If \(\mathcal{V}\) and \(\mathcal{U}\) are inner product spaces with \(N:=\operatorname{dim}(\mathcal{V})<\infty\) is finite dimensional, and \(\mathbf{L}:\mathcal{V}\to\mathcal{U}\) is linear with \(R:=\operatorname{rank}(\mathbf{L})\), then there exist orthonormal bases \(\{\mathbf{v}_{n}\}_{n=1}^{N}\) and \(\{\mathbf{u}_{m}\}_{m=1}^{R}\) for \(\mathcal{V}\) and \(\mathbf{L}(\mathcal{V})\), respectively, and nonnegative scalars \(\{\sigma_{n}\}_{n=1}^{R}\) such that

\[\mathbf{L}\mathbf{v}_{n} = \sigma_{n}\mathbf{u}_{n}.\]

If, in addition, \(\mathcal{U}\) is finite dimensional, then we can extend \(\{\mathbf{u}_{m}\}_{m=1}^{R}\) to an orthonormal basis \(\{\mathbf{u}_{m}\}_{m=1}^{M}\) for \(\mathcal{U}\). If we let \(\mathbf{V}:\mathbb{C}^{N}\to\mathcal{V}\) and \(\mathbf{U}:\mathbb{C}^{M}\to\mathcal{U}\) be the synthesis operators of \(\{\mathbf{v}_{n}\}_{n=1}^{N}\) and \(\{\mathbf{u}_{m}\}_{m=1}^{M}\), respectively, and define \(\mathbf{\Sigma}\in\mathbb{C}^{M\times N}\) by \[\mathbf{\Sigma}(m,n) = \begin{cases} \sigma_{n} & m=n\\ 0 & m\neq n,\end{cases}\] then

\[\mathbf{L} = \mathbf{U\Sigma V}^{\ast}.\]

\(\Box\)

In the case that \(\mathcal{V}=\mathbb{C}^{N}\) and \(\mathcal{U} = \mathbb{C}^{M}\) we have

\[\mathbf{U\Sigma V}^{\ast}=\begin{bmatrix} \vert & \vert & & \vert\\ u_{1} & u_{2} & \cdots & u_{M}\\ \vert & \vert & & \vert\end{bmatrix}\begin{bmatrix} \sigma_{1} & & & & & &\\ & \sigma_{2} & & & & &\\ & & \ddots & & & & \\ & & & \sigma_{R} & & & \\ & & & & 0 & &\\ & & & & & \ddots & \\ & & & & & & 0\end{bmatrix}\begin{bmatrix} - & v_{1}^{\ast} & - \\ - & v_{2}^{\ast} & - \\ & \vdots & \\ - & v_{N}^{\ast} & -\end{bmatrix}\]

\[=\begin{bmatrix} \vert & \vert & & \vert\\ u_{1} & u_{2} & \cdots & u_{M}\\ \vert & \vert & & \vert\end{bmatrix}\begin{bmatrix} - & \sigma_{1} v_{1}^{\ast} & - \\ - & \sigma_{2}v_{2}^{\ast} & - \\ & \vdots & \\ - & \sigma_{R}v_{R}^{\ast} & - \\ - & 0 & -\\ & \vdots & \\ - & 0 & -\end{bmatrix} = \sum_{i=1}^{R}\sigma_{i}u_{i}v_{i}^{\ast}\]

 

Example.

\(\mathbf{L} = \begin{bmatrix} 7 & 3 & 7 & 3\\ 3 & 7 & 3 & 7\end{bmatrix}\)\[ = \left(\frac{1}{\sqrt{2}}\begin{bmatrix} 1 & \phantom{-}1\\ 1 & -1\end{bmatrix}\right)\begin{bmatrix} 10\sqrt{2} & 0 & 0 & 0\\ 0 & 4\sqrt{2} & 0 & 0\end{bmatrix}\left(\frac{1}{2}\left[\begin{array}{rrrr} 1 & 1 & 1 & 1\\ 1 & -1 & 1 & -1\\ 1 & 1 & -1 & -1\\ 1 & -1 & -1 & 1\end{array}\right]\right)\]

Math 621 Spring 2024 Slides

By John Jasper

Math 621 Spring 2024 Slides

  • 125