2022-02-04 Linear Algebra

2022-02-04 Linear Algebra¶

Last time¶

Breaking Newton’s method
Exploration
Multiple roots
Conditioning of the rootfinding problem

Today¶

Forward and backward stability
Beyond IEEE double precision
Algebra of linear transformations

using Plots
default(linewidth=4, legendfontsize=12)

Which is better to model inputs to a rootfinder?¶

A: coefficients \(a_k\) in
\[p(x) = \prod_k (x - a_k)\]

B: coefficients \(b_k\) in
\[p(x) = \sum_k b_k x^k\]

Figure from Trefethen and Bau (1999)¶

Ill conditioning of roots of Wilkinson's polynomial

Forward vs backward error and stability ¶

Stability¶

“nearly the right answer to nearly the right question”

\[ \frac{\lvert \tilde f(x) - f(\tilde x) \rvert}{| f(\tilde x) |} \in O(\epsilon_{\text{machine}}) \]

for some \(\tilde x\) that is close to \(x\)

Backward Stability¶

“exactly the right answer to nearly the right question”

\[ \tilde f(x) = f(\tilde x) \]

for some \(\tilde x\) that is close to \(x\)

Every backward stable algorithm is stable.
Not every stable algorithm is backward stable.

Map angle to the unit circle¶

theta = 1 .+ LinRange(0, 3, 100)
scatter(cos.(theta), sin.(theta), legend=:none, aspect_ratio=:equal)

../_images/2022-02-04-linear-algebra_10_0.svg

theta = LinRange(1., 1+2e-5, 100)
mysin(t) = cos(t - (1e10+1)*pi/2)
plot(cos.(theta), sin.(theta), aspect_ratio=:equal)
scatter!(cos.(theta), mysin.(theta))

../_images/2022-02-04-linear-algebra_11_0.svg

Are we observing A=stabiltiy, B=backward stability, or C=neither?¶

The numbers \((\widetilde\cos \theta, \widetilde\sin \theta = (\operatorname{fl}(\cos \theta), \operatorname{fl}(\sin\theta))\) do not lie exactly on the unit circles.
- There does not exist a \(\tilde\theta\) such that \((\widetilde\cos \theta, \widetilde \sin\theta) = (\cos\tilde\theta, \sin\tilde\theta)\)

Accuracy of backward stable algorithms (Theorem)¶

A backward stable algorithm for computing \(f(x)\) has relative accuracy

\[ \left\lvert \frac{\tilde f(x) - f(x)}{f(x)} \right\rvert \lesssim \kappa(f) \epsilon_{\text{machine}} . \]

Backward stability is generally the best we can hope for.

In practice, it is rarely possible for a function to be backward stable when the output space is higher dimensional than the input space.

Beyond IEEE-754 double precision¶

Lower precision IEEE-754 and relatives¶

The default floating point type in Python is double precision, which requires 8 bytes (64 bits) to store and offers \(\epsilon_{\text{machine}} \approx 10^{-16}\).

IEEE-754 single precision (float in C and related languages) is half the size (4 bytes = 32 bits) and has long been popular when adequate.
IEEE-754 half precision reduces both range (exponent bits) and precision (mantissa) bits.
bfloat16 is a relatively new non-IEEE type that is popular in machine learning because it’s easy to convert to/from single precision (just truncate/round the mantissa).

Type	\(\epsilon_{\text{machine}}\)	exponent bits	mantissa bits
double	1.11e-16	11	52
single	5.96e-8	8	23
half	4.88e-4	5	10
bfloat16	3.91e-3	8	7

Posits¶

Mixed-precision algorithms¶

Sometimes reducing the precision (from double to single, or from single to half) compromises the quality of results (due to ill-conditioning or poor stability), but one can recover accurate results by using higher precision in a small fraction of the computation. These are mixed-precision algorithms, and can be a useful optimization technique.

Such techniques can give up to 2x improvement if memory bandwidth (for floating point data) or pure vectorizable flops are the bottleneck. In case of single to half precision, the benefit can be greater than 2x when using special hardware, such as GPUs with “tensor cores”.

Warning: Premature use of mixed-precision techniques can often obscure better algorithms that can provide greater speedups and/or reliability improvements.

Matrices as linear transformations¶

Linear algebra is the study of linear transformations on vectors, which represent points in a finite dimensional space. The matrix-vector product \(y = A x\) is a linear combination of the columns of \(A\). The familiar definition,

\[ y_i = \sum_j A_{i,j} x_j \]

can also be viewed as

\[\begin{split} y = \Bigg[ A_{:,1} \Bigg| A_{:,2} \Bigg| \dotsm \Bigg] \begin{bmatrix} x_1 \\ x_2 \\ \vdots \end{bmatrix} = \Bigg[ A_{:,1} \Bigg] x_1 + \Bigg[ A_{:,2} \Bigg] x_2 + \dotsb . \end{split}\]

Math and Julia Notation¶

The notation \(A_{i,j}\) corresponds to the Julia syntax A[i,j] and the colon : means the entire range (row or column). So \(A_{:,j}\) is the \(j\)th column and \(A_{i,:}\) is the \(i\)th row. The corresponding Julia syntax is A[:,j] and A[i,:].

Julia has syntax for row vectors, column vectors, and arrays.

[1. 2 3]

1×3 Matrix{Float64}:
 1.0  2.0  3.0

[1, 2, 3]

3-element Vector{Int64}:
 1
 2
 3

[1 0; 0 2; 10 3]

3×2 Matrix{Int64}:
0
2
3

[1; 2; 3]' # transpose

1×3 adjoint(::Vector{Int64}) with eltype Int64:
 1  2  3

Implementing multiplication by row¶

function matmult1(A, x)
    m, n = size(A)
    y = zeros(m)
    for i in 1:m
        for j in 1:n
            y[i] += A[i,j] * x[j]
        end
    end
    y
end

A = reshape(1.:12, 3, 4) # 3x4 matrix
x = [10., 0, 0, 0]
matmult1(A, x)

3-element Vector{Float64}:
0
0
0

# Dot product
A[2,:]' * x

20.0

function matmult2(A, x)
    m, n = size(A)
    y = zeros(m)
    for i in 1:m
        y[i] = A[i,:]' * x
    end
    y
end

matmult2(A, x)

3-element Vector{Float64}:
0
0
0

Implementing multiplication by column¶

function matmult3(A, x)
    m, n = size(A)
    y = zeros(m)
    for j in 1:n
        y += A[:, j] * x[j]
    end
    y
end

matmult3(A, x)

3-element Vector{Float64}:
0
0
0

A * x # We'll use this version

3-element Vector{Float64}:
0
0
0

Polynomial evaluation is (continuous) linear algebra¶

We can evaluate polynomials using matrix-vector multiplication. For example,

\[\begin{split} 5x^3 - 3x = \Bigg[ 1 \Bigg|\, x \Bigg|\, x^2 \,\Bigg|\, x^3 \Bigg] \begin{bmatrix}0 \\ -3 \\ 0 \\ 5 \end{bmatrix} . \end{split}\]

using Polynomials
P(x) = Polynomial(x)

p = [0, -3, 0, 5]
q = [1, 2, 3, 4]
f = P(p) + P(q)
@show f
@show P(p+q)
x = [0., 1, 2]
f.(x)

f = Polynomial(1 - x + 3*x^2 + 9*x^3)
P(p + q) = Polynomial(1 - x + 3*x^2 + 9*x^3)

3-element Vector{Float64}:
0
0
0

plot(f, legend=:bottomright)

../_images/2022-02-04-linear-algebra_34_0.svg

Polynomial evaluation is (discrete) linear algebra¶

V = [one.(x) x x.^2 x.^3]

3×4 Matrix{Float64}:
0  0.0  0.0  0.0
0  1.0  1.0  1.0
0  2.0  4.0  8.0

V * p + V * q

3-element Vector{Float64}:
0
0
0

V * (p + q)

3-element Vector{Float64}:
0
0
0

Vandermonde matrices¶

A Vandermonde matrix is one whose columns are functions evaluated at discrete points.

\[V(x) = \begin{bmatrix} 1 \Bigg| x \Bigg| x^2 \Bigg| x^3 \Bigg| \dotsb \end{bmatrix}\]

function vander(x, k=nothing)
    if isnothing(k)
        k = length(x)
    end
    m = length(x)
    V = ones(m, k)
    for j in 2:k
        V[:, j] = V[:, j-1] .* x
    end
    V
end

vander (generic function with 2 methods)

x = LinRange(-1, 1, 50)
V = vander(x, 4)
plot(x, V)

../_images/2022-02-04-linear-algebra_41_0.svg

Fitting is linear algebra¶

\[ \underbrace{\begin{bmatrix} 1 \Bigg| x \Bigg| x^2 \Bigg| x^3 \Bigg| \dotsb \end{bmatrix}}_{V(x)} \Big[ p \Big] = \Bigg[ y \Bigg]\]

x1 = [-.9, 0.1, .5, .8]
y1 = [1, 2.4, -.2, 1.3]
scatter(x1, y1, markersize=8)

../_images/2022-02-04-linear-algebra_43_0.svg

V = vander(x1)
p = V \ y1 # write y1 in the polynomial basis
scatter(x1, y1, markersize=8)
plot!(P(p), label="P(p)")
plot!(x, vander(x, 4) * p, label="\$ V(x) p\$", linestyle=:dash)

../_images/2022-02-04-linear-algebra_44_0.svg

Some common terminology¶

The range of \(A\) is the space spanned by its columns. This definition coincides with the range of a function \(f(x)\) when \(f(x) = A x\).
The (right) nullspace of \(A\) is the space of vectors \(x\) such that \(A x = 0\).
The rank of \(A\) is the dimension of its range.
A matrix has full rank if the nullspace of either \(A\) or \(A^T\) is empty (only the 0 vector). Equivalently, if all the columns of \(A\) (or \(A^T\)) are linearly independent.
A nonsingular (or invertible) matrix is a square matrix of full rank. We call the inverse \(A^{-1}\) and it satisfies \(A^{-1} A = A A^{-1} = I\).

\(\DeclareMathOperator{\rank}{rank} \DeclareMathOperator{\null}{null} \) If \(A \in \mathbb{R}^{m\times m}\), which of these doesn’t belong?

\(A\) has an inverse \(A^{-1}\)
\(\rank (A) = m\)
\(\null(A) = \{0\}\)
\(A A^T = A^T A\)
\(\det(A) \ne 0\)
\(A x = 0\) implies that \(x = 0\)

A = rand(4,4)
A' * A - A * A'

4×4 Matrix{Float64}:
  0.600554   0.448048  -0.408673   0.619332
  0.448048   0.25941   -0.892723   0.66887
 -0.408673  -0.892723  -2.01712   -0.684047
  0.619332   0.66887   -0.684047   1.15716

What is an inverse?¶

When we write \(x = A^{-1} y\), we mean that \(x\) is the unique vector such that \(A x = y\). (It is rare that we explicitly compute a matrix \(A^{-1}\), though it’s not as “bad” as people may have told you.) A vector \(y\) is equivalent to \(\sum_i e_i y_i\) where \(e_i\) are columns of the identity. Meanwhile, \(x = A^{-1} y\) means that we are expressing that same vector \(y\) in the basis of the columns of \(A\), i.e., \(\sum_i A_{:,i} x_i\).

using LinearAlgebra
A = rand(4, 4)

4×4 Matrix{Float64}:
977026  0.213505  0.119457  0.891853
897504  0.159212  0.758408  0.686009
700864  0.640795  0.403421  0.839417
550796  0.633895  0.6858    0.936514

A \ A

4×4 Matrix{Float64}:
  1.0           0.0   1.51614e-16  -1.65916e-16
 -2.1988e-16    1.0   2.16193e-16  -3.36757e-16
 -1.41362e-16   0.0   1.0           1.00109e-16
  2.05931e-16  -0.0  -2.20364e-16   1.0

inv(A) * A

4×4 Matrix{Float64}:
  1.0          -4.05449e-17   6.58018e-17  -3.52046e-16
 -7.62503e-17   1.0           3.64113e-16   2.84383e-16
 -9.07994e-17  -4.41714e-18   1.0           7.68716e-17
  2.17227e-16  -5.19061e-17  -4.9013e-16    1.0

2022-01-31 Formulation

Numerical Computation

2022-02-04 Linear Algebra

Contents

2022-02-04 Linear Algebra¶

Last time¶

Today¶

Which is better to model inputs to a rootfinder?¶

Figure from Trefethen and Bau (1999)¶

Forward vs backward error and stability ¶

Stability¶

Backward Stability¶

Map angle to the unit circle¶

Are we observing A=stabiltiy, B=backward stability, or C=neither?¶

Accuracy of backward stable algorithms (Theorem)¶

Beyond IEEE-754 double precision¶

Lower precision IEEE-754 and relatives¶

Posits¶

Mixed-precision algorithms¶

Matrices as linear transformations¶

Math and Julia Notation¶

Implementing multiplication by row¶

Implementing multiplication by column¶

Polynomial evaluation is (continuous) linear algebra¶

Polynomial evaluation is (discrete) linear algebra¶

Vandermonde matrices¶

Fitting is linear algebra¶

Some common terminology¶

What is an inverse?¶

Numerical Computation

2022-02-04 Linear Algebra

Contents

2022-02-04 Linear Algebra¶

Last time¶

Today¶

Which is better to model inputs to a rootfinder?¶

Figure from Trefethen and Bau (1999)¶

Forward vs backward error and stability¶

Stability¶

Backward Stability¶

Map angle to the unit circle¶

Are we observing A=stabiltiy, B=backward stability, or C=neither?¶

Accuracy of backward stable algorithms (Theorem)¶

Beyond IEEE-754 double precision¶

Lower precision IEEE-754 and relatives¶

Posits¶

Mixed-precision algorithms¶

Matrices as linear transformations¶

Math and Julia Notation¶

Implementing multiplication by row¶

Implementing multiplication by column¶

Polynomial evaluation is (continuous) linear algebra¶

Polynomial evaluation is (discrete) linear algebra¶

Vandermonde matrices¶

Fitting is linear algebra¶

Some common terminology¶

What is an inverse?¶

Forward vs backward error and stability ¶