2023-02-01 Convergence classes#

  • Office hours: Monday 9-10pm, Tuesday 2-3pm, Thursday 2-3pm

  • This week will stay virtual. Plan is to start in-person the following Monday (Jan 31)

Last time#

  • Forward and backward error

  • Computing volume of a polygon

  • Rootfinding examples

  • Use Roots.jl to solve

  • Introduce Bisection

Today#

  • Discuss rootfinding as a modeling tool

  • Limitations of bisection

  • Convergence classes

  • Intro to Newton methods

using Plots
default(linewidth=4)

Stability demo: Volume of a polygon#

X = ([1 0; 2 1; 1 3; 0 1; -1 1.5; -2 -1; .5 -2; 1 0])
R(θ) = [cos(θ) -sin(θ); sin(θ) cos(θ)]
Y = X * R(deg2rad(10))' .+ [1e4 1e4]
plot(Y[:,1], Y[:,2], seriestype=:shape, aspect_ratio=:equal)
../_images/2023-02-01-convergence-classes_3_0.svg
using LinearAlgebra
function pvolume(X)
    n = size(X, 1)
    vol = sum(det(X[i:i+1, :]) / 2 for i in 1:n-1)
end

@show pvolume(Y)
[det(Y[i:i+1, :]) for i in 1:size(Y, 1)-1]
A = Y[2:3, :]
sum([A[1,1] * A[2,2], - A[1,2] * A[2,1]])
X
pvolume(Y) = 9.249999989226126
8×2 Matrix{Float64}:
  1.0   0.0
  2.0   1.0
  1.0   3.0
  0.0   1.0
 -1.0   1.5
 -2.0  -1.0
  0.5  -2.0
  1.0   0.0
  • Why don’t we even get the first digit correct? These numbers are only \(10^8\) and \(\epsilon_{\text{machine}} \approx 10^{-16}\) so shouldn’t we get about 8 digits correct?

Rootfinding#

Given \(f(x)\), find \(x\) such that \(f(x) = 0\).

We’ll work with scalars (\(f\) and \(x\) are just numbers) for now, and revisit later when they vector-valued.

Change inputs to outputs#

  • \(f(x; b) = x^2 - b\)

    • \(x(b) = \sqrt{b}\)

  • \(f(x; b) = \tan x - b\)

    • \(x(b) = \arctan b\)

  • \(f(x) = \cos x + x - b\)

    • \(x(b) = ?\)

We aren’t given \(f(x)\), but rather an algorithm f(x) that approximates it.

  • Sometimes we get extra information, like fp(x) that approximates \(f'(x)\)

  • If we have source code for f(x), maybe it can be transformed “automatically”

Nesting matryoshka dolls of rootfinding#

  • I have a function (pressure, temperature) \(\mapsto\) (density, energy)

    • I need (density, energy) \(\mapsto\) (pressure, temperature)

  • I know what condition is satisfied across waves, but not the state on each side.

  • Given a proposed state, I can measure how much (mass, momentum, energy) is not conserved.

    • Find the solution that conserves exactly

  • I can compute lift given speed and angle of attack

    • How much can the plane lift?

    • How much can it lift on a short runway?

Discuss: rootfinding and inverse problems#

  • Inverse kinematics for robotics

    • (statics) how much does each joint an robotic arm need to move to grasp an object

    • (with momentum) fastest way to get there (motors and arms have finite strength)

    • similar for virtual reality

  • Infer a light source (or lenses/mirrors) from an iluminated scene

  • Imaging

    • radiologist seeing an x-ray is mentally inferring 3D geometry from 2D image

    • computed tomography (CT) reconstructs 3D model from multiple 2D x-ray images

    • seismology infers subsurface/deep earth structure from seismograpms

Iterative bisection#

f(x) = cos(x) - x
hasroot(f, a, b) = f(a) * f(b) < 0
function bisect_iter(f, a, b, tol)
    hist = Float64[]
    while abs(b - a) > tol
        mid = (a + b) / 2
        push!(hist, mid)
        if hasroot(f, a, mid)
            b = mid
        else
            a = mid
        end
    end
    hist
end
bisect_iter (generic function with 1 method)
length(bisect_iter(f, -1, 3, 1e-20))
56

Let’s plot the error#

\[ \lvert \texttt{bisect}^k(f, a, b) - r \rvert, \quad k = 1, 2, \dotsc \]

where \(r\) is the true root, \(f(r) = 0\).

hist = bisect_iter(f, -1, 3, 1e-10)
r = hist[end] # What are we trusting?
hist = hist[1:end-1]
scatter( abs.(hist .- r), yscale=:log10)
ks = 1:length(hist)
plot!(ks, 4 * (.5 .^ ks))
../_images/2023-02-01-convergence-classes_17_0.svg

Evidently the error \(e_k = x_k - x_*\) after \(k\) bisections satisfies the bound

\[ |e^k| \le c 2^{-k} . \]

Convergence classes#

A convergent rootfinding algorithm produces a sequence of approximations \(x_k\) such that

\[\lim_{k \to \infty} x_k \to x_*\]
where \(f(x_*) = 0\). For analysis, it is convenient to define the errors \(e_k = x_k - x_*\). We say that an iterative algorithm is \(q\)-linearly convergent if
\[\lim_{k \to \infty} |e_{k+1}| / |e_k| = \rho < 1.\]
(The \(q\) is for “quotient”.) A smaller convergence factor \(\rho\) represents faster convergence. A slightly weaker condition (\(r\)-linear convergence or just linear convergence) is that
\[ |e_k| \le \epsilon_k \]
for all sufficiently large \(k\) when the sequence \(\{\epsilon_k\}\) converges \(q\)-linearly to 0.

ρ = 0.8
errors = [1.]
for i in 1:30
    next_e = errors[end] * ρ
    push!(errors, next_e)
end
plot(errors, yscale=:log10, ylims=(1e-10, 1))
../_images/2023-02-01-convergence-classes_21_0.svg
e = hist .- r
scatter(abs.(errors[2:end] ./ errors[1:end-1]), ylims=(0,1))
../_images/2023-02-01-convergence-classes_22_0.svg

Bisection: A = q-linearly convergent, B = r-linearly convergent, C = neither#

Remarks on bisection#

  • Specifying an interval is often inconvenient

  • An interval in which the function changes sign guarantees convergence (robustness)

  • No derivative information is required

  • If bisection works for \(f(x)\), then it works and gives the same accuracy for \(f(x) \sigma(x)\) where \(\sigma(x) > 0\).

  • Roots of even degree are problematic

  • A bound on the solution error is directly available

  • The convergence rate is modest – one iteration per bit of accuracy

Newton-Raphson Method#

Much of numerical analysis reduces to Taylor series, the approximation

\[ f(x) = f(x_0) + f'(x_0) (x-x_0) + f''(x_0) (x - x_0)^2 / 2 + \underbrace{\dotsb}_{O((x-x_0)^3)} \]
centered on some reference point \(x_0\).

In numerical computation, it is exceedingly rare to look beyond the first-order approximation

\[ \tilde f_{x_0}(x) = f(x_0) + f'(x_0)(x - x_0) . \]
Since \(\tilde f_{x_0}(x)\) is a linear function, we can explicitly compute the unique solution of \(\tilde f_{x_0}(x) = 0\) as
\[ x = x_0 - f(x_0) / f'(x_0) . \]
This is Newton’s Method (aka Newton-Raphson or Newton-Raphson-Simpson) for finding the roots of differentiable functions.

An implementation#

function newton(f, fp, x0; tol=1e-8, verbose=false)
    x = x0
    for k in 1:100 # max number of iterations
        fx = f(x)
        fpx = fp(x)
        if verbose
            println("[$k] x=$x  f(x)=$fx  f'(x)=$fpx")
        end
        if abs(fx) < tol
            return x, fx, k
        end
        x = x - fx / fpx
    end  
end

f(x) = cos(x) - x
fp(x) = -sin(x) - 1
newton(f, fp, 1; tol=1e-15, verbose=true)
[1] x=1  f(x)=-0.45969769413186023  f'(x)=-1.8414709848078965
[2] x=0.7503638678402439  f(x)=-0.018923073822117442  f'(x)=-1.6819049529414878
[3] x=0.7391128909113617  f(x)=-4.6455898990771516e-5  f'(x)=-1.6736325442243012
[4] x=0.739085133385284  f(x)=-2.847205804457076e-10  f'(x)=-1.6736120293089505
[5] x=0.7390851332151607  f(x)=0.0  f'(x)=-1.6736120291832148
(0.7390851332151607, 0.0, 5)