Iterative Methods, Conjugate Gradient, and Preconditioning
Let's retread some ground to get a better feel for some of the algorithms we've investigated so far
Saving space and time
Saving space and time
Saving space and time
Set of triplets
i | j | value |
---|---|---|
1 | 1 | 1 |
2 | 1 | 2 |
2 | 2 | 1 |
3 | 3 | 3 |
4 | 3 | 4 |
4 | 4 | 6 |
4 | 5 | 7 |
5 | 5 | 1 |
Bad for iterating over rows/columns
Fast lookup, \(O(m)\) storage
Saving space and time
Linked Representation
1
2
1
4
3
6
7
1
Saving space and time
Compressed Sparse Row
Stores three non-zero arrays
A=[1 2 1 3 4 6 7 1]
IA=[0 2 3 5 6 8]
JA=[0 1 1 2 3 3 3 4]
A = non zero columns of the matrix
IA = number of nonzero elements on the (i-1)-th row in the original matrix
JA = index of the column for an element in A
Saving space and time
Compressed Sparse Row
A=[1 2 1 3 4 6 7 1]
IA=[0 2 3 5 6 8]
JA=[0 1 1 2 3 3 3 4]
A = non zero columns of the matrix
IA = number of nonzero elements on the (i-1)-th row in the original matrix
JA = index of the column for an element in A
def sparse_matrix_mult(n, A, IA, JA, x):
Ax = x.copy()*0
a_idx = 0
for i in range(n):
num_elements = IA[i + 1] - IA[i]
for _ in range(num_elemnets):
Ax[i] += x[JA[a_idx]] * A[a_idx]
a_idx += 1
return Ax
Saving space and time
Related Friends
Saving space and time
An aside
What if you have no explicit matrix \(A\), but only a procedure that calculates \(Ax\) for a given \(x\) ?
Can massively outperform direct methods, especially when convergence is fast
Stationary
Krylov Subspace methods
Jacobi, Gauss seidel
Non-stationary
Simplest method for computing one eigenvalue-eigenvector pair is power iteration, which in effect takes successively higher powers of matrix times initial starting vector
for k in range(iterations):
y = A.dot(x)
x = y/y.infty_norm()
Iterate on vector
Why do these methods work? Also, when?
\(x^{new} = Tx^{old} + c\)
T never changes
\(x^*_i = \frac{b_i - \sum_{i\ne j} a_{i,j}x_j}{a_{ii}}\)
Beginning with an initial guess, both methods compute the next iterate by solving for each component of \(x\) in terms of other components of \(x\)
If D, L, and U are diagonal, strict lower triangular, and strict upper triangular portions of A, then Jacobi method can be written
\(x^* = D^{-1}(b-(L+U)x)\)
\(x^*_i = \frac{b_i - \sum_{j<i} a_{i,j}x^*_j -\sum_{j>i} a_{i,j}x_j}{a_{ii}}\)
\(x^* = (L+D)^{-1}(b-Ux)\)
Convergence
Take a random \(100 \times 100\) matrix and solve for a random vector
\(\log \|Ax-b\|\)
Iterations
Gauss Seidel
Jacobi
A first introduction, sorta
Change problem slightly so that it's easier to solve.
\(x^*_i =(1-\omega)x^*_i + \omega\left(\frac{b_i - \sum_{j<i} a_{i,j}x^*_j -\sum_{j>i} a_{i,j}x_j}{a_{ii}}\right)\)
A first introduction, sorta
Change problem slightly so that it's easier to solve.
Iterations
Gauss Seidel
Gauss Seidel SOR \(\omega=0.95\)
\(\log \|Ax-b\|\)
\(\displaystyle\argmin_x \frac{1}{2}x^TAx - x^Tb\)
Obtains minimum when \(Ax=b\)
Steepest descent \(x^*=x + \alpha\nabla(\frac{1}{2}x^TAx - x^Tb)\)
\(\alpha = r^Ts/(s^TAs)\)
\(r:= b - Ax\)
If A is \(n \times n\) symmetric positive definite matrix, we minimize the quadratic form
Optimal line search parameter
def steepest_descent(A, b, max_iters = 1000):
x = np.array([0.0] * A.shape[0])
for _ in xrange(max_iters):
r = b-A.dot(x)
s = r.dot(r)
a = s/(r.dot(A.dot(r)))
if s < 1e-10:
break
x = x+a*r
print("%d iterations" % _ )
return x, r
Example
Problem: Chooses the same vector more than once
Fixing steepest descent by choosing gradients in new, orthogonal, directions
\( K_n = \{b, Ab, A^2b, \cdots A^{n-1}b\}\)
General idea
Seek solutions in the space
That is, at step \(n\) look for a solution \(x_n\) such that
\(\displaystyle\min_{x_n \in K_n} \|b-Ax_n\|\)
Why find a solution in \(K_n\)?
General idea
\(0 = q(A)\)
\(0 = \displaystyle\sum_{i=0}^m a_iA^i\)
\(0 = a_0I + \displaystyle\sum_{i=1}^m a_iA^i\)
\(I = - \displaystyle\sum_{i=1}^m \frac{a_i}{a_0}A^i\)
\(A^{-1} = - \displaystyle\sum_{i=1}^m \frac{a_i}{a_0}A^{i-1}\)
\(A^{-1}b = - \displaystyle\sum_{i=1}^m \frac{a_i}{a_0}A^{i-1}b\)
\(x = A^{-1}b \in K_m\)
\(q(\cdot)\) is the minimal polynomial for A
\(m\) depends on the structure of the eigenvalues of \(A\)
Fixing SD by choosing gradients in new, unique directions
Each sucessive \(p_i\) is chosen to be orthogonal to the set \(\{p_0, \cdots, p_{i-1}\}\)
As the algorithm progresses, \(p_i\) and \(r_i\) span the same Krylov subspace.
\(x_k\)can be regarded as the projection of \(x\) on \(K_k\).
If \(m << n\), then we converge in few iterations. If we allow \(k\to n\) then we have a direct method.
def conjgrad(A, x, b):
r = b - A.dot(x)
p = r.copy()
rsold = r.dot(r)
e = []
for i in range(len(b)):
Ap = A.dot(p);
alpha = rsold / (p.dot(Ap))
x = x + alpha * p;
r = r - alpha * Ap;
rsnew = r.dot(r);
p = r + (rsnew / rsold) * p;
rsold = rsnew;
e += [sqrt(rsnew)]
return x, e
Iterations
Steepest Desc
Conj Grad
\(\log \|Ax-b\|\)
What if A isn't positive definite? What if A isn't symmetric?
\( K_n = \{b, Ab, A^2b, \cdots A^{n-1}b\}\)
Again, for an \(m\times m\) matrix \(A\) seek solutions in the space
That is, at step \(n\) look for a solution \(x_n\) such that
\(\displaystyle\min_{x_n \in K_n} \|b-Ax_n\|\)
Iteratively build up an orthogonal basis for \(K_n\) (QR decomposition) which allows us to easily solve for a given \(x_n\)
Use Arnoldi iteration to build \(Q_n\) and \(H\) such that \(AQ_n = Q_{n+1}H\) (H is Hassenberg, i.e almost upper triangular)
Start with an arbitrary vector \(q_1\) with norm 1.
For k = 2, 3, …
Can be built up incrementally
Use Arnoldi iteration to build \(Q_n\) and \(H\) such that \(AQ_n = Q_{n+1}H\) (H is Hassenberg, i.e almost upper triangular)
\(\|Ax_n - b\| = \|Q_{n+1}^TAx_n - Q_{n+1}^Tb\| = \|Q_{n+1}^TAQ_ny_n - Q_{n+1}^Tb\| = \|H_ny_n - \beta e_1\|\)
with \(\beta = \|b-Ax\|\)
So the algorithm is as follows, on the \(n^{th}\) iteration
Since \(x\in K_n\) and \(Q_n\) is a basis for \( K_n\)
Make a given problem easier
Instead of solving \(Ax = b\), we solve a related problem \(AM^{-1}y = b\) for \(y\), then compute \(x = My\)
Incomplete LU factorization
Since LU of sparse matrix may be dense, seek a sparse approximation to LU factorization.
Start by noting all the non-zero elements of an \(n \times n\) sparse matrix \(A\).
\(G(A) = \{(i,j) | A_{ij} \ne 0\}\)
Then choose a sparsity pattern \(S\) such that
\(S\subset \{1, \cdots, n\}^2, \ \ G(A) \subset S, \ \ \ \{(i,i) | i\in\{1\cdots n\}\} \subset S\),
A decomposition of the form \(A=LU-R\) where the following hold
Incomplete Cholesky factorization
A sparse approximation of the Cholesky factorization.
Recall the Cholesky factorization of a positive definite matrix \(A\) is \(A\) = \(LL^*\) where \(L\) is a lower triangular matrix.
Incomplete Cholesky factorization is a sparse approximation to \(L\) lower triangular matrix \(M\) that is close to \(L\). The corresponding preconditioner is \(MM^*\).
Use the algorithm for finding the exact Cholesky decomposition, except that any entry is set to zero if the corresponding entry in A is also zero.
Incomplete Cholesky factorization
A sparse approximation of the Cholesky factorization.
def ichol(A):
n = A.shape[0]
L = np.zeros(A.shape)
for k in range(n):
L[k,k] = sqrt(A[k,k])
for i in range(k+1):
if A[i,k] != 0:
L[i,k] = A[i,k]/L[k,k]
return L
Symmetric SOR
If the original matrix can be split into diagonal, lower and upper triangular as \(A=D+L+L^T\) then the SSOR preconditioner matrix is defined as
\(M(\omega) = \frac{\omega}{2-\omega}\left(\frac{1}{\omega} D + L\right)D^{-1}\left(\frac{1}{\omega} D + L\right)^T\)
What makes a good preconditioner?
Many iterative methods have the smoothing property where oscillatory modes of the error are eliminated effectively, but smooth modes are damped slowly.
So we transform the problem such that we forcibly dampen lower frequency error modes. I.e we solve a similar problem on coarser grids, then use that as an estimate for our initial guess
After a few iterations
One dimensional boundary value problem
We want to find a function \(u(x)\) satisfying the above.
Model Problem
For \(0 < x < 1\), \(\sigma > 0\) with \(u(0)=u(1)=0\)
One dimensional boundary value problem
Grid:
\(h:=\frac{1}{N}, x_j = jh, j = 0,1, \cdots N\)
\(x_0, x_1, x_2\)
\(x_j\)
\(x_N\)
Let \(f_i = f(x_i)\) and \(v_i \approx u(x_i)\), we want to recover \(v_i\)
\(\cdots\)
\(\cdots\)
\(\cdots\)
\(x_0, x_1, x_2\)
\(x_j\)
\(x_{N/2}\)
\(\cdots\)
\(\cdots\)
\(\Omega_h\)
\(\Omega_{2h}\)
If we call \(e = \tilde{x} - x \), that is, \(e\) is the error between an approximation and ground truth, then \(A(\tilde{x} + e) = b\)
It follows that \(Ae = r\), so solve for \(e\) approximately, then add that to \(\tilde{x}\).
Given an approximation to \(x\), calculate the residual \(r\) then project onto a coarser grid (restriction).
Solve for \(e\) on coarser grid, with a coarser approximate operator to \(A\).
Interpolate \(e\) on the finer grid, and perform residual correction (prolongation).
Given an approximation to \(x\), calculate the residual \(r\) then project onto a coarser grid (restriction).
Interpolate \(e\) on the finer grid, and perform residual correction (prolongation).