2025-10-01 GMRES#

Last time#

  • Householder QR

  • Composition of reflectors

  • Comparison of interfaces

  • Profiling

  • Cholesky QR

Today#

  • Intro to Krylov/GMRES

  • Matrix norms and conditioning

using LinearAlgebra
using Plots
default(linewidth=4, legendfontsize=12)

function vander(x, k=nothing)
    if isnothing(k)
        k = length(x)
    end
    m = length(x)
    V = ones(m, k)
    for j in 2:k
        V[:, j] = V[:, j-1] .* x
    end
    V
end

function gram_schmidt_classical(A)
    m, n = size(A)
    Q = zeros(m, n)
    R = zeros(n, n)
    for j in 1:n
        v = A[:,j]
        R[1:j-1,j] = Q[:,1:j-1]' * v
        v -= Q[:,1:j-1] * R[1:j-1,j]
        R[j,j] = norm(v)
        Q[:,j] = v / R[j,j]
    end
    Q, R
end

function qr_householder(A)
    m, n = size(A)
    R = copy(A)
    V = [] # list of reflectors
    for j in 1:n
        v = copy(R[j:end, j])
        v[1] += sign(v[1]) * norm(v) # <---
        v = normalize(v)
        R[j:end,j:end] -= 2 * v * v' * R[j:end,j:end]
        push!(V, v)
    end
    V, R
end

function qr_chol(A)
    R = cholesky(A' * A).U
    Q = A / R
    Q, R
end

function qr_chol2(A)
    Q, R = qr_chol(A)
    Q, R1 = qr_chol(Q)
    Q, R1 * R
end
qr_chol2 (generic function with 1 method)

Krylov subspaces#

Suppose we wish to solve

\[A x = b\]

but only have the ability to apply the action of the \(A\) (we do not have access to its entries). A general approach is to work with approximations in a Krylov subspace, which has the form

\[ K_n = \big[ b \big| Ab \big| A^2 b \big| \dotsm \big| A^{n-1} b \big] . \]

This matrix is horribly ill-conditioned and cannot stably be computed as written. Instead, we seek an orthogonal basis \(Q_n\) that spans the same space as \(K_n\). We could write this as a factorization

\[ K_n = Q_n R_n \]

where the first column \(q_1 = b / \lVert b \rVert\). The \(R_n\) is unnecessary and hopelessly ill-conditioned, so a slightly different procedure is used.

Arnoldi iteration#

The Arnoldi iteration applies orthogonal similarity transformations to reduce \(A\) to Hessenberg form, starting from a vector \(q_1 = b/\lVert b \rVert\),

\[ A = Q H Q^T . \]

Let’s multiply on the right by \(Q\) and examine the first \(n\) columns,

\[ A Q_n = Q_{n+1} H_n \]

where \(H_n\) is an \((n+1) \times n\) Hessenberg matrix.

GMRES#

\[ A Q_n = Q_{n+1} H_n \]

GMRES (Generalized Minimum Residual) minimizes

\[ \lVert A x - b \rVert \]
over the subspace \(Q_n\). I.e., \(x = Q_n y\) for some \(y\). By the Arnoldi recurrence, this is equivalent to
\[ \lVert Q_{n+1} H_n y - b \lVert \]
which can be solved by minimizing
\[ \lVert H_n y - Q_{n+1}^T b \rVert . \]
Since \(q_1 = b/\lVert b \lVert\), the least squares problem is to minimize
\[ \Big\lVert H_n y - \lVert b \rVert e_1 \Big\rVert . \]
The solution of this least squares problem is achieved by incrementally updating a \(QR\) factorization of \(H_n\).

Notes#

  • The solution \(x_n\) constructed by GMRES at iteration \(n\) is not explicitly available. If a solution is needed, it must be constructed by solving the \((n+1)\times n\) least squares problem and forming the solution as a linear combination of the \(n\) vectors \(Q_n\). The leading cost is \(2mn\) where \(m \gg n\).

  • The residual vector \(r_n = A x_n - b\) is not explicitly available in GMRES. To compute it, first build the solution \(x_n = Q_n y_n\).

  • GMRES minimizes the 2-norm of the residual \(\lVert r_n \rVert\) which is equivalent to the \(A^T A\) norm of the error \(\lVert x_n - x_* \rVert_{A^T A}\).

More notes on GMRES#

  • GMRES needs to store the full \(Q_n\), which is unaffordable for large \(n\) (many iterations). The standard solution is to choose a “restart” \(k\) and to discard \(Q_n\) and start over with an initial guess \(x_k\) after each \(k\) iterations. This algorithm is called GMRES(k). PETSc’s default solver is GMRES(30) and the restart can be controlled using the run-time option -ksp_gmres_restart.

  • Most implementations of GMRES use classical Gram-Schmidt because it is much faster in parallel (one reduction per iteration instead of \(n\) reductions per iteration). The PETSc option -ksp_gmres_modifiedgramschmidt can be used when you suspect that classical Gram-Schmidt may be causing instability.

  • There is a very similar (and older) algorithm called GCR that maintains \(x_n\) and \(r_n\). This is useful, for example, if a convergence tolerance needs to inspect individual entries. GCR requires \(2n\) vectors instead of \(n\) vectors, and can tolerate a nonlinear preconditioner. FGMRES is a newer algorithm with similar properties to GCR.

Condition numbers for matrices#

We may have informally referred to a matrix as “ill-conditioned” when the columns are nearly linearly dependent, but let’s make this concept for precise. Recall the definition of (relative) condition number:

\[ \kappa = \max_{\delta x} \frac{|\delta f|/|f|}{|\delta x|/|x|} . \]

We understood this definition for scalar problems, but it also makes sense when the inputs and/or outputs are vectors (or matrices, etc.) and absolute value is replaced by vector (or matrix) norms. Consider matrix-vector multiplication, for which \(f(x) = A x\).

\[ \kappa(A) = \max_{\delta x} \frac{\lVert A (x+\delta x) - A x \rVert/\lVert A x \rVert}{\lVert \delta x\rVert/\lVert x \rVert} = \max_{\delta x} \frac{\lVert A \delta x \rVert}{\lVert \delta x \rVert} \, \frac{\lVert x \rVert}{\lVert A x \rVert} = \lVert A \rVert \frac{\lVert x \rVert}{\lVert A x \rVert} . \]

There are two problems here:

  • I wrote \(\kappa(A)\) but my formula depends on \(x\).

  • What is that \(\lVert A \rVert\) beastie?

Matrix norms induced by vector norms#

Vector norms are built into the linear space (and defined in term of the inner product). Matrix norms are induced by vector norms, according to

\[ \lVert A \rVert = \max_{x \ne 0} \frac{\lVert A x \rVert}{\lVert x \rVert} . \]
  • This equation makes sense for non-square matrices – the vector norms of the input and output spaces may differ.

  • Due to linearity, all that matters is direction of \(x\), so it could equivalently be written

\[ \lVert A \rVert = \max_{\lVert x \rVert = 1} \lVert A x \rVert . \]

The formula makes sense, but still depends on \(x\).#

\[\kappa(A) = \lVert A \rVert \frac{\lVert x \rVert}{\lVert Ax \rVert}\]

Consider the matrix

\[\begin{split} A = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix} . \end{split}\]
  • What is the norm of this matrix?

  • What is the condition number when \(x = [1,0]^T\)?

  • What is the condition number when \(x = [0,1]^T\)?

Condition number of the matrix#

The condition number of matrix-vector multiplication depends on the vector. The condition number of the matrix is the worst case (maximum) of the condition number for any vector, i.e.,

\[ \kappa(A) = \max_{x \ne 0} \lVert A \rVert \frac{\lVert x \rVert}{\lVert A x \rVert} .\]

If \(A\) is invertible, then we can rephrase as

\[ \kappa(A) = \max_{x \ne 0} \lVert A \rVert \frac{\lVert A^{-1} (A x) \rVert}{\lVert A x \rVert} = \max_{A x \ne 0} \lVert A \rVert \frac{\lVert A^{-1} (A x) \rVert}{\lVert A x \rVert} = \lVert A \rVert \lVert A^{-1} \rVert . \]

Evidently multiplying by a matrix is just as ill-conditioned of an operation as solving a linear system using that matrix.