2025-09-05 Rootfinding

2025-09-05 Rootfinding#

Last time#

Forward and backward error
Forward and backward stability
Computing volume of a polygon
Goal-setting and activity
Discussion of errors

Today#

Rootfinding as a modeling tool
Use Roots.jl to solve
Introduce Bisection
Convergence classes
Intro to Newton methods

using Plots
default(linewidth=4)

Rootfinding#

Given \(f(x)\), find \(x\) such that \(f(x) = 0\).

We’ll work with scalars (\(f\) and \(x\) are just numbers) for now, and revisit later when they vector-valued.

Change inputs to outputs#

\(f(x; b) = x^2 - b\)
- \(x(b) = \sqrt{b}\)
\(f(x; b) = \tan x - b\)
- \(x(b) = \arctan b\)
\(f(x) = \cos x + x - b\)
- \(x(b) = ?\)

We aren’t given \(f(x)\), but rather an algorithm f(x) that approximates it.

Sometimes we get extra information, like fp(x) that approximates \(f'(x)\)
If we have source code for f(x), maybe it can be transformed “automatically”

Nesting matryoshka dolls of rootfinding#

I have a function (pressure, temperature) \(\mapsto\) (density, energy)
- I need (density, energy) \(\mapsto\) (pressure, temperature)
I know what condition is satisfied across waves, but not the state on each side.

Toro (2009)

Given a proposed state, I can measure how much (mass, momentum, energy) is not conserved.
- Find the solution that conserves exactly
I can compute lift given speed and angle of attack
- How much can the plane lift?
- How much can it lift on a short runway?

Discuss: rootfinding and inverse problems#

Inverse kinematics for robotics
- (statics) how much does each joint an robotic arm need to move to grasp an object
- (with momentum) fastest way to get there (motors and arms have finite strength)
- similar for virtual reality
Infer a light source (or lenses/mirrors) from an iluminated scene
Imaging
- radiologist seeing an x-ray is mentally inferring 3D geometry from 2D image
- computed tomography (CT) reconstructs 3D model from multiple 2D x-ray images
- seismology infers subsurface/deep earth structure from seismograpms

Example: Queueing#

In a simple queueing model, there is an arrival rate and a departure (serving) rate. While waiting in the queue, there is a probability of “dropping out”. The length of the queue in this model is

\[ \text{length} = \frac{\text{arrival} - \text{drop}}{\text{departure}} \]

One model for the waiting time (where these rates are taken from exponential distributions) is

wait(arrival, departure) = log(arrival / departure) / departure

plot(d -> wait(1, d), xlims=(.1, 1), xlabel="departure", ylabel="wait")

Departure rate given wait#

Easy to measure wait
I have a limited tolerance for waiting

my_wait = 0.8
plot([d -> wait(1, d) - my_wait, d -> 0], xlims=(.1, 1), xlabel="departure", ylabel="wait")

#import Pkg; Pkg.add("Roots")
using Roots
d0 = find_zero(d -> wait(1, d) - my_wait, 1)
plot([d -> wait(1, d) - my_wait, d -> 0], xlims=(.1, 1), xlabel="departure", ylabel="wait")
scatter!([d0], [0], marker=:circle, color=:black, title="d0 = $d0")

Example: Nonlinear Elasticity#

Strain-energy formulation#

\[ \psi = C_1 (I_1 - 3 - 2 \log J) + D_1 (J - 1)^2 \]

where \(I_1 = \lambda_1^2 + \lambda_2^2 + \lambda_3^3\) and \(J = \lambda_1 \lambda_2 \lambda_3\) are invariants defined in terms of the principle stretches \(\lambda_i\).

Uniaxial extension #

In the experiment, we would like to know the stress as a function of the stretch \(\lambda_1\). We don’t know \(J\), and will have to determine it by solving an equation.

https://www.youtube.com/watch?v=9N5SS8f1auI&t=60

How much does the volume change?#

Using symmetries of uniaxial extension, we can write an equation \(f(\lambda, J) = 0\) that must be satisfied. We’ll need to solve a rootfinding problem to compute \(J(\lambda)\).

function f(lambda, J)
    C_1 = 1.5e6
    D_1 = 1e8
    D_1 * J^(8/3) - D_1 * J^(5/3) + C_1 / (3*lambda) * J - C_1 * lambda^2/3
end
plot(J -> f(4, J), xlims=(0.1, 3), xlabel="J", ylabel="f(J)")

find_J(lambda) = find_zero(J -> f(lambda, J), 1)
plot(find_J, xlims=(0.1, 5), xlabel="lambda", ylabel="J")

An algorithm: Bisection#

Bisection is a rootfinding technique that starts with an interval \([a,b]\) containing a root and does not require derivatives. Suppose \(f\) is continuous. What is a sufficient condition for \(f\) to have a root on \([a,b]\)?

hasroot(f, a, b) = f(a) * f(b) < 0

f(x) = cos(x) - x
plot(f)

Bisection#

function bisect(f, a, b, tol)
    mid = (a + b) / 2
    if abs(b - a) < tol
        return mid
    elseif hasroot(f, a, mid)
        return bisect(f, a, mid, tol)
    else
        return bisect(f, mid, b, tol)
    end
end

x0 = bisect(f, -1, 3, 1e-5)
x0, f(x0)

(0.7390861511230469, -1.7035832658995886e-6)

How fast does it converge?#

function bisect_hist(f, a, b, tol)
    mid = (a + b) / 2
    if abs(b - a) < tol
        return [mid]
    elseif hasroot(f, a, mid)
        return prepend!(bisect_hist(f, a, mid, tol), [mid])
    else
        return prepend!(bisect_hist(f, mid, b, tol), [mid])
    end
end

bisect_hist (generic function with 1 method)

bisect_hist(f, -1, 3, 1e-4)

17-element Vector{Float64}:
0
0
5
75
625
6875
71875
734375
7421875
73828125
740234375
7392578125
73876953125
739013671875
7391357421875
73907470703125
739105224609375

Iterative bisection#

Data structures often optimized for appending rather than prepending.
Bounds stack space

f(x) = cos(x) - x
hasroot(f, a, b) = f(a) * f(b) < 0
function bisect_iter(f, a, b, tol)
    hist = Float64[]
    while abs(b - a) > tol
        mid = (a + b) / 2
        push!(hist, mid)
        if hasroot(f, a, mid)
            b = mid
        else
            a = mid
        end
    end
    hist
end

bisect_iter (generic function with 1 method)

length(bisect_iter(f, -1, 3, 1e-20))

Let’s plot the error#

\[ \lvert \texttt{bisect}^k(f, a, b) - r \rvert, \quad k = 1, 2, \dotsc \]

where \(r\) is the true root, \(f(r) = 0\).

hist = bisect_iter(f, -1, 3, 1e-10)
r = hist[end] # What are we trusting?
hist = hist[1:end-1]
scatter( abs.(hist .- r), yscale=:log10)
ks = 1:length(hist)
plot!(ks, 4 * (.5 .^ ks))

Evidently the error \(e_k = x_k - x_*\) after \(k\) bisections satisfies the bound

\[ |e^k| \le c 2^{-k} . \]

Convergence classes#

A convergent rootfinding algorithm produces a sequence of approximations \(x_k\) such that

\[\lim_{k \to \infty} x_k \to x_*\]

where \(f(x_*) = 0\). For analysis, it is convenient to define the errors \(e_k = x_k - x_*\). We say that an iterative algorithm is \(q\)-linearly convergent if

\[\lim_{k \to \infty} |e_{k+1}| / |e_k| = \rho < 1.\]

(The \(q\) is for “quotient”.) A smaller convergence factor \(\rho\) represents faster convergence. A slightly weaker condition (\(r\)-linear convergence or just linear convergence) is that

\[ |e_k| \le \epsilon_k \]

for all sufficiently large \(k\) when the sequence \(\{\epsilon_k\}\) converges \(q\)-linearly to 0.

ρ = 0.8
errors = [1.]
for i in 1:30
    next_e = errors[end] * ρ
    push!(errors, next_e)
end
plot(errors, yscale=:log10, ylims=(1e-10, 1))

e = hist .- r
scatter(abs.(errors[2:end] ./ errors[1:end-1]), ylims=(0,1))

Bisection: A = q-linearly convergent, B = r-linearly convergent, C = neither#

Remarks on bisection#

Specifying an interval is often inconvenient
An interval in which the function changes sign guarantees convergence (robustness)
No derivative information is required
If bisection works for \(f(x)\), then it works and gives the same accuracy for \(f(x) \sigma(x)\) where \(\sigma(x) > 0\).
Roots of even degree are problematic
A bound on the solution error is directly available
The convergence rate is modest – one iteration per bit of accuracy

For you#

Share an equation that is useful for one task, but requires rootfinding for another task.
OR: explain a system in which one stakeholder has the natural inputs, but a different stakeholder only knows outputs.

Please collaborate.