2022-02-11 Gram-Schmidt

Last time

  • Projections

  • Rotations

  • Reflections

Today

  • Revisit projections, rotations, and reflections

  • Constructing orthogonal bases

  • QR factorization

using LinearAlgebra
using Plots
using Polynomials
default(linewidth=4, legendfontsize=12)

function vander(x, k=nothing)
    if isnothing(k)
        k = length(x)
    end
    m = length(x)
    V = ones(m, k)
    for j in 2:k
        V[:, j] = V[:, j-1] .* x
    end
    V
end
┌ Info: Precompiling Polynomials [f27b6e38-b328-58d1-80ce-0feddd5e7a45]
└ @ Base loading.jl:1423
vander (generic function with 2 methods)

Polynomials can be orthogonal too!

x = LinRange(-1, 1, 50)
A = vander(x, 4)
M = A * [.5 0 0 0; # 0.5
         0  1 0 0;  # x
         0  0 1 0]' # x^2
scatter(x, M)
plot!(x, 0*x, label=:none, color=:black)
../_images/2022-02-11-gram-schmidt_3_0.svg
  • Which inner product will be zero?

    • Which functions are even and odd?

Pairwise inner product

The pairwise inner products between two sets of vectors can be expressed by collecting the sets as columns in matrices and writing \(A = X^T Y\) where \(A_{i,j} = x_i^T y_j\). It follows from this definition that

\[ (X^T Y)^T = Y^T X .\]

M' * M
3×3 Matrix{Float64}:
 12.5          -1.11022e-15   8.67347
 -1.11022e-15  17.3469       -2.22045e-16
  8.67347      -2.22045e-16  10.8272

Normalization and orthogonalization

q1 = M[:,1]
q1 /= norm(q1) # normalize q1
Q = [q1 M[:, 2:end]]
scatter(x, Q)
../_images/2022-02-11-gram-schmidt_7_0.svg
Q' * Q
3×3 Matrix{Float64}:
  1.0          -1.94289e-16   2.45323
 -1.94289e-16  17.3469       -2.22045e-16
  2.45323      -2.22045e-16  10.8272

Orthogonal matrices

If \(x^T y = 0\) then we say \(x\) and \(y\) are orthogonal (or “\(x\) is orthogonal to \(y\)”). A vector is said to be normalized if \(\lVert x \rVert = 1\). If \(x\) is orthogonal to \(y\) and \(\lVert x \rVert = \lVert y \rVert = 1\) then we say \(x\) and \(y\) are orthonormal. A square matrix with orthonormal columns is said to be an orthogonal matrix. We typically use \(Q\) or \(U\) and \(V\) for matrices that are known/constructed to be orthogonal. Orthogonal matrices are always full rank – the columns are linearly independent. The inverse of a orthogonal matrix is its transpose:

\[ Q^T Q = Q Q^T = I . \]
Orthogonal matrices are a powerful building block for robust numerical algorithms.

Gram-Schmidt orthogonalization

Suppose we’re given some vectors and want to find an orthogonal basis for their span.

\[\begin{split} \Bigg[ a_1 \Bigg| a_2 \Bigg] = \Bigg[ q_1 \Bigg| q_2 \Bigg] \begin{bmatrix} r_{11} & r_{12} \\ 0 & r_{22} \end{bmatrix} \end{split}\]

A naive algorithm

function gram_schmidt_naive(A)
    m, n = size(A)
    Q = zeros(m, n)
    R = zeros(n, n)
    for j in 1:n
        v = A[:,j]
        for k in 1:j-1
            r = Q[:,k]' * v
            v -= Q[:,k] * r
            R[k,j] = r
        end
        R[j,j] = norm(v)
        Q[:,j] = v / R[j,j]
    end
    Q, R
end
gram_schmidt_naive (generic function with 1 method)
A = vander(x, 4)
Q, R = gram_schmidt_naive(A)
@show norm(Q' * Q - I)
@show norm(Q * R - A);
norm(Q' * Q - I) = 6.609184312080184e-16
norm(Q * R - A) = 2.500206551829073e-16

What do orthogonal polynomials look like?

A = vander(x, 5)
Q, R = gram_schmidt_naive(A)
plot(x, Q, marker=:auto)
../_images/2022-02-11-gram-schmidt_15_0.svg

What happens if we use more than 50 values of \(x\)? Is there a continuous limit?

Theorem

Every full-rank \(m\times n\) matrix (\(m \ge n\)) has a unique reduced \(Q R\) factorization with \(R_{j,j} > 0\)

The algorithm we’re using generates this matrix due to the line:

        R[j,j] = norm(v)

Solving equations using \(QR = A\)

If \(A x = b\) then \(Rx = Q^T b\).

x1 = [-0.9, 0.1, 0.5, 0.8] # points where we know values
y1 = [1, 2.4, -0.2, 1.3]
scatter(x1, y1)
Q, R = gram_schmidt_naive(vander(x1, 3))
p = R \ (Q' * y1)
plot!(x, vander(x, 3) * p)
../_images/2022-02-11-gram-schmidt_19_0.svg

How accurate is it?

m = 20
x = LinRange(-1, 1, m)
A = vander(x, m)
Q, R = gram_schmidt_naive(A)
@show norm(Q' * Q - I)
@show norm(Q * R - A)
norm(Q' * Q - I) = 1.073721107832196e-8
norm(Q * R - A) = 8.268821431611631e-16
8.268821431611631e-16

A variant with more parallelism

\[ q_2 q_2^T (q_1 q_1^T v) = q_2 (q_2^T q_1) q_1^T v = 0 \]
function gram_schmidt_classical(A)
    m, n = size(A)
    Q = zeros(m, n)
    R = zeros(n, n)
    for j in 1:n
        v = A[:,j]
        R[1:j-1,j] = Q[:,1:j-1]' * v
        v -= Q[:,1:j-1] * R[1:j-1,j]
        R[j,j] = norm(v)
        Q[:,j] = v / norm(v)
    end
    Q, R
end
gram_schmidt_classical (generic function with 1 method)
m = 10
x = LinRange(-1, 1, m)
A = vander(x, m)
Q, R = gram_schmidt_classical(A)
@show norm(Q' * Q - I)
@show norm(Q * R - A)
norm(Q' * Q - I) = 6.339875256299394e-11
norm(Q * R - A) = 1.217027619812654e-16
1.217027619812654e-16