2022-02-03 Newton methods
Contents
2022-02-03 Newton methods#
Last time#
Discuss rootfinding as a modeling tool
Limitations of bisection
Convergence classes
Intro to Newton methods
Today#
Newton’s method via Taylor series
Convergence theory for fixed point methods
Derive Newton’s method via convergence theory
Newton methods in computing culture
Breaking Newton’s method
using Plots
default(linewidth=4, legendfontsize=12)
f(x) = cos(x) - x
hasroot(f, a, b) = f(a) * f(b) < 0
function bisect_iter(f, a, b, tol)
hist = Float64[]
while abs(b - a) > tol
mid = (a + b) / 2
push!(hist, mid)
if hasroot(f, a, mid)
b = mid
else
a = mid
end
end
hist
end
bisect_iter (generic function with 1 method)
Convergence classes#
A convergent rootfinding algorithm produces a sequence of approximations \(x_k\) such that
hist = bisect_iter(f, -1, 3, 1e-10)
r = hist[end] # What are we trusting?
hist = hist[1:end-1]
scatter( abs.(hist .- r), yscale=:log10)
ks = 1:length(hist)
ρ = 0.5
plot!(ks, 4 * (ρ .^ ks))
Newton-Raphson Method#
Much of numerical analysis reduces to Taylor series, the approximation
In numerical computation, it is exceedingly rare to look beyond the first-order approximation
An implementation#
function newton(f, fp, x0; tol=1e-8, verbose=false)
x = x0
for k in 1:100 # max number of iterations
fx = f(x)
fpx = fp(x)
if verbose
println("[$k] x=$x f(x)=$fx f'(x)=$fpx")
end
if abs(fx) < tol
return x, fx, k
end
x = x - fx / fpx
end
end
f(x) = cos(x) - x
fp(x) = -sin(x) - 1
newton(f, fp, 1; tol=1e-15, verbose=true)
[1] x=1 f(x)=-0.45969769413186023 f'(x)=-1.8414709848078965
[2] x=0.7503638678402439 f(x)=-0.018923073822117442 f'(x)=-1.6819049529414878
[3] x=0.7391128909113617 f(x)=-4.6455898990771516e-5 f'(x)=-1.6736325442243012
[4] x=0.739085133385284 f(x)=-2.847205804457076e-10 f'(x)=-1.6736120293089505
[5] x=0.7390851332151607 f(x)=0.0 f'(x)=-1.6736120291832148
(0.7390851332151607, 0.0, 5)
That’s really fast!#
10 digits of accuracy in 4 iterations.
How is this convergence test different from the one we used for bisection?
How can this break down?
newton(f, fp, -pi/2+.1; verbose=true)
[1] x=-1.4707963267948965 f(x)=1.5706297434417247 f'(x)=-0.0049958347219742905
[2] x=312.9170549232224 f(x)=-312.59435002533314 f'(x)=-0.05350037037604283
[3] x=-5529.927542752894 f(x)=5530.676391917825 f'(x)=-0.33725953180603474
[4] x=10868.945936970244 f(x)=-10868.376227850798 f'(x)=-0.17815359146505727
[5] x=-50136.70732252356 f(x)=50135.70777741902 f'(x)=-1.0301593101044748
[6] x=-1468.7903453577164 f(x)=1468.8859787856973 f'(x)=-1.9954166200403913
[7] x=-732.6603742863299 f(x)=731.8761094362295 f'(x)=-1.6204261800544972
[8] x=-281.0038172368656 f(x)=280.8358913898124 f'(x)=-1.9857996296872256
[9] x=-139.58174866993488 f(x)=139.79912372811944 f'(x)=-0.02391184615360531
[10] x=5706.856156210999 f(x)=-5707.008659764705 f'(x)=-1.9883029222393844
[11] x=2836.5648158265976 f(x)=-2837.5220962814674 f'(x)=-1.2891610809292957
[12] x=635.503879177757 f(x)=-634.8839651181479 f'(x)=-1.784669713127157
[13] x=279.7607629875442 f(x)=-280.7481464325292 f'(x)=-0.8416524942745961
[14] x=-53.80700796583557 f(x)=52.88592082634005 f'(x)=-1.3893564966145653
[15] x=-15.74196061822768 f(x)=14.742538472479499 f'(x)=-1.0339908015219368
[16] x=-1.4840596284124175 f(x)=1.5706876106501757 f'(x)=-0.0037592697076601622
[17] x=416.333158287511 f(x)=-416.4052274406877 f'(x)=-1.99739963763799
[18] x=207.8594910282616 f(x)=-206.9888910802111 f'(x)=-1.4919915959184906
[19] x=69.12620885264326 f(x)=-68.1262712417355 f'(x)=-1.0111702413616044
[20] x=1.7525179991632598 f(x)=-1.933241162917233 f'(x)=-1.9835340045381016
[21] x=0.7778731690296721 f(x)=-0.0654654835715871 f'(x)=-1.7017658368004631
[22] x=0.7394040200105153 f(x)=-0.0005337303513481828 f'(x)=-1.6738476794194503
[23] x=0.7390851556610822 f(x)=-3.756576461011463e-8 f'(x)=-1.6736120457726615
[24] x=0.7390851332151608 f(x)=-2.220446049250313e-16 f'(x)=-1.6736120291832148
(0.7390851332151608, -2.220446049250313e-16, 24)
Convergence of fixed-point (by mean value theorem)#
Consider the iteration
Taking absolute values,
Convergence of fixed-point (by Taylor series)#
Consider the iteration
In terms of the error \(e_k = x_k - x_*\),
Poll: Is this convergence A=q-linear, B=r-linear, C=neither?#
Recall the definition of q-linear convergence
Aside: Big \(O\) (“big oh”) notation#
Limit \(n\to\infty\)#
We’d say an algorithm costs \(O(n^2)\) if its running time on input of size \(n\) is less than \(c n^2\) for some constant \(c\) and sufficiently large \(n\).
Sometimes we write \(\operatorname{cost}(\texttt{algorithm}, n) = O(n^2)\) or (preferably) \(\operatorname{cost}(\texttt{algorithm}) \in O(n^2)\).
Note that \(O(\log n) \subset O(n) \subset O(n\log n) \subset O(n^2) \subset \dotsb\) so it’s correct to say “binary search is in \(O(n^2)\)”, even though a sharper statement is also true.
We say the algorithm is in \(\Theta(n^2)\) (“big theta”) if
Limit \(h \to 0\)#
In numerical analysis, we often have a small real number, and now the definitions take the limit as the small number goes to zero. So we say a term in an expression is in \(O(h^2)\) if
Big \(O\) terms can be manipulated as
Example of a fixed point iteration#
We wanted to solve \(\cos x - x = 0\), which occurs when \(g(x) = \cos x\) is a fixed point.
xstar, _ = newton(f, fp, 1.)
g(x) = cos(x)
gp(x) = -sin(x)
@show xstar
@show gp(xstar)
plot([x->x, g], xlims=(-2, 3))
scatter!([xstar], [xstar],
label="\$x_*\$")
xstar = 0.739085133385284
gp(xstar) = -0.6736120293089505
function fixed_point(g, x, n)
xs = [x]
for k in 1:n
x = g(x)
append!(xs, x)
end
xs
end
xs = fixed_point(g, 2., 15)
plot!(xs, g.(xs), seriestype=:path, marker=:auto)
Verifying fixed point convergence theory#
@show gp(xstar)
es = xs .- xstar
es[2:end] ./ es[1:end-1]
gp(xstar) = -0.6736120293089505
15-element Vector{Float64}:
-0.9161855415615605
-0.15197657010596488
-0.734870205299266
-0.624132525531327
-0.7026257933893496
-0.6523498121376077
-0.6870971782336925
-0.664168570025122
-0.6798044680427148
-0.6693659427636027
-0.6764378047956165
-0.6716930541785153
-0.6748976495459512
-0.6727427617641084
-0.6741962236114177
scatter(abs.(es), yscale=:log10, label="fixed point")
plot!(k -> abs(gp(xstar))^k, label="\$|g'|^k\$")
Plotting Newton convergence#
function newton_hist(f, fp, x0; tol=1e-12)
x = x0
hist = []
for k in 1:100 # max number of iterations
fx = f(x)
fpx = fp(x)
push!(hist, [x fx fpx])
if abs(fx) < tol
return vcat(hist...)
end
x = x - fx / fpx
end
end
newton_hist (generic function with 1 method)
xs = newton_hist(f, fp, 1.97)
@show x_star = xs[end,1]
plot(xs[1:end-1,1] .- x_star, yscale=:log10, marker=:auto)
x_star = xs[end, 1] = 0.7390851332151607
Poll: Is this convergence A=q-linear, B=r-linear, C=neither?#
Formulations are not unique (constants)#
If \(x = g(x)\) then
c = .5
g2(x) = x + c * (cos(x) - x)
g2p(x) = 1 + c * (-sin(x) - 1)
@show g2p(xstar)
plot([x->x, g, g2], ylims=(-5, 5), label=["x" "g" "g2"])
g2p(xstar) = 0.16319398534552476
xs = fixed_point(g2, 1., 15)
xs .- xstar
16-element Vector{Float64}:
0.26091486661471597
0.03106601954878585
0.004893162344945079
0.0007941171212053622
0.00012947850276123773
2.112687301181193e-5
3.4475537732392425e-6
5.62475483634195e-7
9.16501970982253e-8
1.4814399151852342e-8
2.2752605355336186e-9
2.2894852680366284e-10
-1.0499723313017739e-10
-1.594951948291623e-10
-1.683889694348295e-10
-1.6984036399492197e-10
Formulations are not unique (functions)#
If \(x = g(x)\) then
h(x) = -1 / (gp(x) - 1)
g3(x) = x + h(x) * (g(x) - x)
plot([x-> x, cos, g2, g3], ylims=(-5, 5))
We don’t know \(g'(x_*)\) in advance because we don’t know \(x_*\) yet.
This method converges very fast
We actually just derived Newton’s method.
A fresh derivation of Newton’s method#
A rootfinding problem \(f(x) = 0\) can be converted to a fixed point problem
\[x = x + f(x) =: g(x)\]but there is no guarantee that \(g'(x_*) = 1 + f'(x_*)\) will have magnitude less than 1.Problem-specific algebraic manipulation can be used to make \(|g'(x_*)|\) small.
\(x = x + h(x) f(x)\) is also a valid formulation for any \(h(x)\) bounded away from \(0\).
Can we choose \(h(x)\) such that
\[ g'(x) = 1 + h'(x) f(x) + h(x) f'(x) = 0\]when \(f(x) = 0\)?
In other words,
Quadratic convergence!#
What does it mean that \(g'(x_*) = 0\)?
It turns out that Newton’s method has locally quadratic convergence to simple roots,
\[\lim_{k \to \infty} \frac{|e_{k+1}|}{|e_k|^2} < \infty.\]“The number of correct digits doubles each iteration.”
Now that we know how to make a good guess accurate, the effort lies in getting a good guess.
Culture: fast inverse square root#
The following code appeared literally (including comments) in the Quake III Arena source code (late 1990s).
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the fuck?
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
return y;
}
We now have vector instructions for approximate inverse square root. More at https://en.wikipedia.org/wiki/Fast_inverse_square_root
How does it work?#
Let’s look at the last line
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
We want a function \(f(y)\) such that \(f(1/\sqrt{x}) = 0\). One such function is
There are others, e.g.,
Newton’s method is
Rootfinding outlook#
Newton methods are immensely successful
Convergence theory is local; we need good initial guesses (activity)
Computing the derivative \(f'(x)\) is intrusive
Avoided by secant methods (approximate the derivative; activity)
Algorithmic or numerical differentiation (future topics)
Bisection is robust when conditions are met
Line search (activity)
When does Newton diverge?
More topics
Find all the roots
Use Newton-type methods with bounds
Times when Newton converges slowly