2025-10-01 GMRES#
Last time#
Householder QR
Composition of reflectors
Comparison of interfaces
Profiling
Cholesky QR
Today#
Intro to Krylov/GMRES
Matrix norms and conditioning
using LinearAlgebra
using Plots
default(linewidth=4, legendfontsize=12)
function vander(x, k=nothing)
if isnothing(k)
k = length(x)
end
m = length(x)
V = ones(m, k)
for j in 2:k
V[:, j] = V[:, j-1] .* x
end
V
end
function gram_schmidt_classical(A)
m, n = size(A)
Q = zeros(m, n)
R = zeros(n, n)
for j in 1:n
v = A[:,j]
R[1:j-1,j] = Q[:,1:j-1]' * v
v -= Q[:,1:j-1] * R[1:j-1,j]
R[j,j] = norm(v)
Q[:,j] = v / R[j,j]
end
Q, R
end
function qr_householder(A)
m, n = size(A)
R = copy(A)
V = [] # list of reflectors
for j in 1:n
v = copy(R[j:end, j])
v[1] += sign(v[1]) * norm(v) # <---
v = normalize(v)
R[j:end,j:end] -= 2 * v * v' * R[j:end,j:end]
push!(V, v)
end
V, R
end
function qr_chol(A)
R = cholesky(A' * A).U
Q = A / R
Q, R
end
function qr_chol2(A)
Q, R = qr_chol(A)
Q, R1 = qr_chol(Q)
Q, R1 * R
end
qr_chol2 (generic function with 1 method)
Krylov subspaces#
Suppose we wish to solve
but only have the ability to apply the action of the \(A\) (we do not have access to its entries). A general approach is to work with approximations in a Krylov subspace, which has the form
This matrix is horribly ill-conditioned and cannot stably be computed as written. Instead, we seek an orthogonal basis \(Q_n\) that spans the same space as \(K_n\). We could write this as a factorization
where the first column \(q_1 = b / \lVert b \rVert\). The \(R_n\) is unnecessary and hopelessly ill-conditioned, so a slightly different procedure is used.
Arnoldi iteration#
The Arnoldi iteration applies orthogonal similarity transformations to reduce \(A\) to Hessenberg form, starting from a vector \(q_1 = b/\lVert b \rVert\),
Let’s multiply on the right by \(Q\) and examine the first \(n\) columns,
where \(H_n\) is an \((n+1) \times n\) Hessenberg matrix.
GMRES#
GMRES (Generalized Minimum Residual) minimizes
Notes#
The solution \(x_n\) constructed by GMRES at iteration \(n\) is not explicitly available. If a solution is needed, it must be constructed by solving the \((n+1)\times n\) least squares problem and forming the solution as a linear combination of the \(n\) vectors \(Q_n\). The leading cost is \(2mn\) where \(m \gg n\).
The residual vector \(r_n = A x_n - b\) is not explicitly available in GMRES. To compute it, first build the solution \(x_n = Q_n y_n\).
GMRES minimizes the 2-norm of the residual \(\lVert r_n \rVert\) which is equivalent to the \(A^T A\) norm of the error \(\lVert x_n - x_* \rVert_{A^T A}\).
More notes on GMRES#
GMRES needs to store the full \(Q_n\), which is unaffordable for large \(n\) (many iterations). The standard solution is to choose a “restart” \(k\) and to discard \(Q_n\) and start over with an initial guess \(x_k\) after each \(k\) iterations. This algorithm is called GMRES(k). PETSc’s default solver is GMRES(30) and the restart can be controlled using the run-time option
-ksp_gmres_restart
.Most implementations of GMRES use classical Gram-Schmidt because it is much faster in parallel (one reduction per iteration instead of \(n\) reductions per iteration). The PETSc option
-ksp_gmres_modifiedgramschmidt
can be used when you suspect that classical Gram-Schmidt may be causing instability.There is a very similar (and older) algorithm called GCR that maintains \(x_n\) and \(r_n\). This is useful, for example, if a convergence tolerance needs to inspect individual entries. GCR requires \(2n\) vectors instead of \(n\) vectors, and can tolerate a nonlinear preconditioner. FGMRES is a newer algorithm with similar properties to GCR.
Condition numbers for matrices#
We may have informally referred to a matrix as “ill-conditioned” when the columns are nearly linearly dependent, but let’s make this concept for precise. Recall the definition of (relative) condition number:
We understood this definition for scalar problems, but it also makes sense when the inputs and/or outputs are vectors (or matrices, etc.) and absolute value is replaced by vector (or matrix) norms. Consider matrix-vector multiplication, for which \(f(x) = A x\).
There are two problems here:
I wrote \(\kappa(A)\) but my formula depends on \(x\).
What is that \(\lVert A \rVert\) beastie?
Matrix norms induced by vector norms#
Vector norms are built into the linear space (and defined in term of the inner product). Matrix norms are induced by vector norms, according to
This equation makes sense for non-square matrices – the vector norms of the input and output spaces may differ.
Due to linearity, all that matters is direction of \(x\), so it could equivalently be written
The formula makes sense, but still depends on \(x\).#
Consider the matrix
What is the norm of this matrix?
What is the condition number when \(x = [1,0]^T\)?
What is the condition number when \(x = [0,1]^T\)?
Condition number of the matrix#
The condition number of matrix-vector multiplication depends on the vector. The condition number of the matrix is the worst case (maximum) of the condition number for any vector, i.e.,
If \(A\) is invertible, then we can rephrase as
Evidently multiplying by a matrix is just as ill-conditioned of an operation as solving a linear system using that matrix.