Higher order Newton-like methods for root finding

There are variations on Newton’s root finding method that use higher derivatives and converge faster. Alston Householder developed a sequence of such methods of the form

H_n(x) = x + (n-1) \frac{\left( \frac{1}{f(x)}\right)^{(n-2)}}{\left( \frac{1}{f(x)}\right)^{(n-1)}}

where the superscripts in parentheses indicate derivatives. When n = 2, Householder’s method reduces to Newton’s method.

Once Newton’s method is close enough to a root, the error is squared at each iteration. Said another way, the number of correct decimal places roughly doubles. With the Householder method of order n, the number of correct decimal places roughly increases by a factor of n.

The Householder method Hn can be written as a rational function of f and its derivatives up to order n-1, though the formulas are complicated.

Maybe it’s time for Householder’s methods to be more widely used: with automatic differentiation [1], the complexity of computing higher order derivatives is not the drawback it was when these computations were done solely by hand.

Demonstration

We’ll illustrate the convergence of the Householder methods by finding roots of the quintic polynomial

f(x) = x5 + ax + b.

I used Mathematica to calculate expressions for the Householder methods and export the results as code. Mathematica does not have a function to export Python code, but it does have a function CForm to export expressions as C code. I started to use this, converting exponents from C to Python form with search and replace. Then I tried using FortranForm instead, and that was much easier because Fortran and Python both represent powers with a ** operator.

def f(x, a, b):
    return x**5 + a*x + b

def H2(x, a, b):
    return x - (b + a*x + x**5)/(a + 5*x**4)
    
def H3(x, a, b):
    num = (a + 5*x**4)*(b + a*x + x**5)
    den = a**2 - 10*b*x**3 + 15*x**8
    return x - num/den

def H4(x, a, b):
    num = (b + a*x + x**5)*(-a**2 + 10*b*x**3 - 15*x**8)
    den = a**3 + 10*b**2*x**2 + 5*a**2*x**4 - 80*b*x**7 - 25*a*x**8 + 35*x**12
    return x + num/den

To set values of a and b, I arbitrarily chose a = 2 and solved for b so that π is a root. That way we have a known value of the root for comparison. I used x = 4 as my starting guess at the root.

I first used -log10|x – π| to measure the number of correct decimals. That didn’t quite work because the method would converge exactly (to the limits of machine precision) which led to taking the log of zero. So I modified my accuracy function to the following.

def decimals(x):
    d = abs(x - pi)
    if d != 0:
        return( round(-log10(d),2) )
    else:
        return "inf"

Then I tested each of the Householder methods with the following.

epsilon = 1e-14
for H in [H2, H3, H4]:
    x = 4
    for i in range(1, 10):
        if abs(f(x,a,b)) > epsilon:
            x = H(x,a,b)
            print(i, x, decimals(x))

Here are the results in tabular form. I replaced all the decimals after the first incorrect decimal with underscores to make it easier to see the convergence.

|---+-------------------+--------|
| i |                 x | digits |
|---+-------------------+--------|
| 1 | 3.4______________ |   0.53 |
| 2 | 3.18_____________ |   1.33 |
| 3 | 3.142____________ |   2.87 |
| 4 | 3.141593_________ |   5.93 |
| 5 | 3.141592653590___ |  12.07 |
| 6 | 3.141592653589793 |    inf |
|---+-------------------+--------|
| 1 | 3.2______________ |   1.11 |
| 2 | 3.1416___________ |   4.03 |
| 3 | 3.1415926535899__ |  12.79 |
| 4 | 3.141592653589793 |    inf |
|---+-------------------+--------|
| 1 | 3.15_____________ |   1.84 |
| 2 | 3.141592654______ |   8.85 |
| 3 | 3.141592653589793 |    inf |
|---+-------------------+--------|

Note that, as promised, the number of correct decimal places for Hn increases by roughly a factor of n with each iteration.

Update: See this post where the symbolic calculations are done in SymPy rather than Mathematica. Everything stays in Python, and there’s no need to edit the code for the Householder methods.

Related posts

[1] Note that in this post I used Mathematica to do symbolic differentiation, not automatic differentiation. Automatic differentiation differs from both symbolic and numerical differentiation. See an explanation here.

Drawing with Unicode block characters

My previous post contained the image below.

The point of the post was that the numbers that came up in yet another post made the fractal image above when you wrote them in binary. Here’s the trick I used to make that image.

To make the pattern of 0’s and 1’s easier to see, I wanted to make a graphic with black and white squares instead of the characters 0 and 1. I thought about writing a program to create an image using strings of 0’s and 1’s as input, but then I thought of something more direct.

The Unicode character U+2588 is just a solid rectangle. I replaced the 1’s with U+2588 and replaced the 0’s with spaces. So I made a text file that looked like the the image above, and took a screen shot of it. The only problem was that the image was rectangular. I thought the pattern would be easier to see if it were a square, so I changed the aspect ratio on the image.

Here’s another example using Unicode block elements, this time to make a little bar graph, sorta like Edward Tufte’s idea of a sparkline. The characters U+2581 through U+2588 produce bars of increasing height.

bar graph of English letter frequencies

To create images like this, your font needs to include glyphs for Unicode block elements. The font needs to be monospaced as well so the letters will line up under the blocks. The example above was created using the APL385. It also looks good in Hack and Input Mono. In some fonts the block characters don’t align consistently on the baseline.

Here’s the Python code that produced the above graph of English letter frequencies.

# Letter frequencies via
# http://www.norvig.com/mayzner.html
freq = [ 
  8.04, 1.48, 3.34, 3.82, 12.49,
  2.40, 1.87, 5.05, 7.57,  0.16,
  0.54, 4.07, 2.51, 7.23,  7.64,
  2.14, 0.12, 6.28, 6.51,  9.28,
  2.73, 1.05, 1.68, 0.23,  1.66,
  0.09,
]
m = max(freq)

for i in range(26):
    u = int(8*freq[i]/m)
    ch = chr(0x2580+u) if u > 0 else " "
    print(ch, end="")

print()
for i in range(26):
    print(chr(0x41+i), end="")

Predicted distribution of Mersenne primes

Mersenne primes are prime numbers of the form 2p – 1. It turns out that if 2p – 1 is a prime, so is p; the requirement that p is prime is a theorem, not part of the definition.

So far 51 Mersenne primes have discovered [1]. Maybe that’s all there are, but it is conjectured that there are an infinite number Mersenne primes. In fact, it has been conjectured that as x increases, the number of primes px is asymptotically

eγ log x / log 2

where γ is the Euler-Mascheroni constant. For a heuristic derivation of this conjecture, see Conjecture 3.20 in Not Always Buried Deep.

How does the actual number of Mersenne primes compared to the number predicted by the conjecture? We’ll construct a plot below using Python. Note that the conjecture is asymptotic, and so it could make poor predictions for now and still be true for much larger numbers. But it appears to make fairly good predictions over the range where we have discovered Mersenne primes.

Predicted vs actual Mersenne primes

import numpy as np
import matplotlib.pyplot as plt

# p's for which 2^p - 1 is prime.
# See https://oeis.org/A000043
ps = [2, 3, 5, ... , 82589933]

# x has 200 points from 10^1 to 10^8 
# spaced evenly on a logarithmic scale
x = np.logspace(1, 8, 200)

# number of p's less than x such that 2^p - 1 is prime
actual = [np.searchsorted(ps, t) for t in x]

exp_gamma = np.exp(0.5772156649)
predicted = [exp_gamma*np.log2(t) for t in x]

plt.plot(x, actual)
plt.plot(x, predicted, "--")

plt.xscale("log")
plt.xlabel("p")
plt.ylabel(r"Mersenne primes $\leq 2^p-1$")
plt.legend(["actual", "predicted"])

Related posts

[1] Fifty one Mersenne primes have been verified. But these may not be the smallest Mersenne primes. It has not yet been verified that there are no Mersenne primes yet to be discovered between the 47th and 51st known ones. The plot in this post assumes the known Mersenne primes are consecutive, and so it is speculative toward the right end.

String interpolation in Python and R

One of the things I liked about Perl was string interpolation. If you use a variable name in a string, the variable will expand to its value. For example, if you a variable $x which equals 42, then the string

    "The answer is $x."

will expand to “The answer is 42.” Perl requires variables to start with sigils, like the $ in front of scalar variables. Sigils are widely considered to be ugly, but they have their benefits. Here, for example, $x is clearly a variable name, whereas x would not be.

You can do something similar to Perl’s string interpolation in Python with so-called f-strings. If you put an f in front of an opening quotation mark, an expression in braces will be replaced with its value.

    >>> x = 42
    >>> f"The answer is {x}."
    'The answer is 42.'

You could also say

    >>> f"The answer is {6*7}."

for example. The f-string is just a string; it’s only printed because we’re working from the Python REPL.

The glue package for R lets you do something very similar to Python’s f-strings.

    > library(glue)
    > x <- 42
    > glue("The answer is {x}.")
    The answer is 42.
    > glue("The answer is {6*7}.")
    The answer is 42.

As with f-strings, glue returns a string. It doesn’t print the string, though the string is displayed because we’re working from the REPL, the R REPL in this case.

Regular expressions and special characters

Special characters make text processing more complicated because you have to pay close attention to context. If you’re looking at Python code containing a regular expression, you have to think about what you see, what Python sees, and what the regular expression engine sees. A character may be special to Python but not to regular expressions, or vice versa.

This post goes through an example in detail that shows how to manage special characters in several different contexts.

Escaping special TeX characters

I recently needed to write a regular expression [1] to escape TeX special characters. I’m reading in text like ICD9_CODE and need to make that ICD9\_CODE so that TeX will understand the underscore to be a literal underscore, and a subscript instruction.

Underscore isn’t the only special character in TeX. It has ten special characters:

    \ { } $ & # ^ _ % ~

The two that people most commonly stumble over are probably $ and % because these are fairly common in ordinary prose. Since % begins a comment in TeX, importing a percent sign without escaping it will fail silently. The result is syntactically valid. It just effectively cuts off the remainder of the line.

So whenever my script sees a TeX special character that isn’t already escaped, I’d like it to escape it.

Raw strings

First I need to tell Python what the special characters are for TeX:

    special = r"\\{}$&#^_%~"

There’s something interesting going on here. Most of the characters that are special to TeX are not special to Python. But backslash is special to both. Backslash is also special to regular expressions. The r prefix in front of the quotes tells Python this is a “raw” string and that it should not interpret backslashes as special. It’s saying “I literally want a string that begins with two backslashes.”

Why two backslashes? Wouldn’t one do? We’re about to use this string inside a regular expression, and backslashes are special there too. More on that shortly.

Lookbehind

Here’s my regular expression:

    re.sub(r"(?<!\\)([" + special + "])", r"\\\1", line)

I want special characters that have not already been escaped, so I’m using a negative lookbehind pattern. Negative lookbehind expressions begin with (?<! and end with ). So if, for example, I wanted to look for the string “ball” but only if it’s not preceded by “charity” I could use the regular expression

    (?<!charity )ball

This expression would match “foot ball” or “foosball” but not “charity ball”.

Our lookbehind expression is complicated by the fact that the thing we’re looking back for is a special character. We’re looking for a backslash, which is a special character for regular expressions [2].

After looking behind for a backslash and making sure there isn’t one, we look for our special characters. The reason we used two backslashes in defining the variable special is so the regular expression engine would see two backslashes and interpret that as one literal backslash.

Captures

The second argument to re.sub tells it what to replace its match with. We put parentheses around the character class listing TeX special characters because we want to capture it to refer to later. Captures are referred to by position, so the first capture is \1, the second is \2, etc.

We want to tell re.sub to put a backslash in front of the first capture. Since backslashes are special to the regular expression engine, we send it \\ to represent a literal backslash. When we follow this with \1 for the first capture, the result is \\\1 as above.

Testing

We can test our code above on with the following.

    line = r"a_b $200 {x} %5 x\y"

and get

    a\_b \$200 \{x\} \%5 x\\y

which would cause TeX to produce output that looks like

a_b $200 {x} %5 x\y.

Note that we used a raw string for our test case. That was only necessary for the backslash near the end of the string. Without that we could have dropped the r in front of the opening quote.

P.S. on raw strings

Note that you don’t have to use raw strings. You could just escape your special characters with backslashes. But we’ve already got a lot of backslashes here. Without raw strings we’d need even more. Without raw strings we’d have to say

    special = "\\\\{}$&#^_%~"

starting with four backslashes to send Python two to send the regular expression engine one.

Related posts

[1] Whenever I write about using regular expressions someone will complain that my solution isn’t completely general and that they can create input that will break my code. I understand that, but it works for me in my circumstances. I’m just writing scripts to get my work done, not claiming to have written hardened production software for anyone else to use.

[2] Keep context in mind. We have three languages in play: TeX, Python, and regular expressions. One of the keys to understanding regular expressions is to see them as a small language embedded inside other languages like Python. So whenever you hear a character is special, ask yourself “Special to whom?”. It’s especially confusing here because backslash is special to all three languages.

Angles in the spiral of Theodorus

The previous post looked at how to plot the spiral of Theodorus shown below.

Spiral of TheodorusWe stopped the construction where we did because the next triangle to be added would overlap the first triangle, which would clutter the image. But we could certainly have kept going.

If we do keep going, then the set of hypotenuse angles will be dense in the circle, with no repeats.

The nth triangle has sides of length 1 and √n, and so the tangent of nth triangle’s acute angle is 1/√n. The angle formed by the nth hypotenuse is thus

arctan(1) + arctan(1/√2) + arctan(1/√3) + … + arctan(1/√n).

Here’s a plot of the first 99 hypotenuse angles.

Angles formed by hypotenuses in spiral of Theodorus

Here’s the code that produced the plot.

    from numpy import *
    import matplotlib.pyplot as plt

    plt.axes().set_aspect(1)

    N = 100
    theta = cumsum(arctan(arange(1,N)**-0.5))
    plt.scatter(cos(theta), sin(theta))
    plt.savefig("theodorus_angles.svg")

If we change N to 500 we get a solid ring because the angles are closer together than the default thickness of dots in a scatterplot.

Distribution of quadratic residues

Let p be an odd prime number. If the equation

x² = n mod p

has a solution then n is a square mod p, or in classical terminology, n is a quadratic residue mod p. Half of the numbers between 0 and p are quadratic residues and half are not. The residues are distributed somewhat randomly. For large p, quadratic residues are distributed enough like random numbers to be useful in cryptographic applications. For example, see this post. But the distribution isn’t entirely random as we’ll demonstrate here.

First let’s look at a couple illustrations, using p = 101 and p = 103. We’ll use the following Python code to plot the residues.

    import matplotlib.pyplot as plt

    for p in [101, 103]:
        s = {x**2 % p for x in range(1, p)}
        for q in s:
            plt.vlines(q, 0, 1)
        plt.title("Quadratic residues mod {}".format(p))
        plt.savefig("residues_mod_{}.svg".format(p))
        plt.close()

Here are the plots.

quadratic residues mod 101

quadratic residues mod 103

Symmetry and anti-symmetry

If you look closely at the first image, you’ll notice it appears to be symmetric about the middle. And if you look closely at the second image, you’ll notice it appears to be anti-symmetric, i.e. the left side has bars where the right side has gaps and vice versa.

The following code shows that these observations are correct.

    p = 101
    s = {x**2 % p for x in range(1, p)}
    for q in s:
        assert(p - q in s)
    
    p = 103
    s = {x**2 % p for x in range(1, p)}
    for q in s:
        assert(p - q not in s)

Both of these observations hold more generally. If p = 1 mod 4, the residues are symmetric: n is a quadratic residue if and only if pn is a quadratic residue. And if p = 3 mod 4, the residues are anti-symmetric: n is a quadratic residue if and only if pn is not a quadratic residue. Both of these statements are consequences of the quadratic reciprocity theorem.

Quadratic excess

If you look again at the plot of residues for p = 103, you’ll notice that not only is it anti-symmetric, it is also darker on the left side. We can verify with the following code

   print(len( [q for q in s if q < 51] )) 
   print(len( [q for q in s if q > 51] ))

that there are 28 residues on the left and and 23 on the right. This is a general pattern called quadratic excess. The quadratic excess of an odd prime p, denoted E(p) is the number of quadratic residues to the left of the middle minus the number to the right of the middle. If p = 1 mod 4, E(p) is zero by symmetry, but if p = 3 mod 4, E(p) is always positive.

Here is a plot of quadratic excess for primes less than 3000 and congruent to 3 mod 4.

plot of quadratic excess

Statistical properties

In this post I’ve said a couple things that are in tension. At the top I said that quadratic residues act sufficiently like random bits to be useful in cryptography. Then later on I showed that they have major departures from random behavior. That would be a problem if you looked at the entire sequence of residues, but if p has thousands of digits, and you’re looking at a small sample of the reciprocity values, that’s a different situation.

Suppose you pick two large enough primes, p and q, and keep them secret but tell me their product N = pq. Then you pick a random value n and ask me whether it’s a quadratic residue mod N, the best I can do is say “Maybe, with probability 1/2.” There are crypto systems that depend on this being a computationally hard problem to solve, unless the factors p and q are known.

However, you need to be a little careful. If n is smaller than √N, I could test whether n is a square, working in the integers, not the integers mod N. If it is a square integer, then it’s a square mod N. If it’s not a square integer, it might be a square mod N, but I wouldn’t know.

There’s a related issue I may explore in the future. Suppose instead of just asking whether a particular number is a quadratic residue, you take a range of numbers and return 0 or 1 depending on whether each number is a residue. Can I distinguish the sequence from a random sequence using statistical tests? If I took a sequence of length greater than N/2 then I could tell by looking at symmetry or antisymmetry. But what if the range is small relative to N? Maybe N is a thousand-digit numnber but you only give me a sequence of a million bits. Would the distribution be distinguishable from uniform random bits? Is there any sort of autocorrelation that might betray the sequence as not random?

Related posts

Bessel function crossings

The previous post looked at the angles that graphs make when they cross. For example, sin(x) and cos(x) always cross with the same angle. The same holds for sin(kx) and cos(kx) since the k simply rescales the x-axis.

The post ended with wondering about functions analogous to sine and cosine, such as Bessel functions. This post will look at that question in more detail. Specifically we’ll look at the functions Jν and Yν.

Because these two Bessel functions satisfy the same second order linear homogeneous differential equation, the Strum separation theorem says that their zeros are interlaced: between each pair of consecutive zeros of Jν is exactly one zero of Yν, and between each pair of consecutive zeros of Yν there is exactly one zero of Jν.

Plotting Bessel functions J_3 and Y_3

In the following Python code, we find zeros of Jν, then look in between for places where Jν and Yν cross. Next we find the angle the two curves make at each intersection and plot the angles.

    from scipy.special import jn_zeros, jv, yv
    from scipy.optimize import bisect
    from numpy import empty, linspace, arccos
    import matplotlib.pyplot as plt
    
    n = 3 # bessel function order
    N = 100 # number of zeros
    
    z = jn_zeros(n, N) # Zeros of J_n
    crossings = empty(N-1)
    
    f = lambda x: jv(n,x) - yv(n,x)    
    for i in range(N-1):
        crossings[i] = bisect(f, z[i], z[i+1])
    
    def angle(n, x):
        # Derivatives of J_nu and Y_nu
        dj = 0.5*(jv(n-1,x) - jv(n+1,x))
        dy = 0.5*(yv(n-1,x) - yv(n+1,x))
        
        top = 1 + dj*dy
        bottom = ((1 + dj**2)*(1 + dy**2))**0.5
        return arccos(top/bottom)
        
    y = angle(n, crossings)
    plt.plot(y)
    plt.xlabel("Crossing number")
    plt.ylabel("Angle in radians")
    plt.show()

This shows that the angles steadily decrease, apparently quadratically.

Angles of crossing of J_3 and Y_3

This quadratic behavior is what we should expect from the asymptotics of Jν and Yν: For large arguments they act like shifted and rescaled versions of sin(x)/√x. So if we looked at √xJν and √xYν rather than Jν and Yν we’d expect the angles to reach some positive asymptote, and they do, as shown below.

Angles of crossing of √x J_3 and √xY_3

Related posts

Squircle perimeter and the isoparametric problem

If you have a fixed length of rope and you want to enclose the most area inside the rope, make it into a circle. This is the solution to the so-called isoparametric problem.

Dido’s problem is similar. If one side of your bounded area is given by a straight line, make your rope into a semi-circle.

This post looks at a similar problem for a squircle. Peter Panholzer mentioned the problem of finding the squircle exponent that led to the largest area in proportion to its arclength. This sounds like the isoparametric problem, but it’s not.

The isoparametric problem holds perimeter constant and lets the shape enclosed vary, maximizing the area. The question here is to hold the axes constant and maximize the ratio of the area to the perimeter. Panholzer reports the maximum occurs at p = 4.39365.

Computing the perimeter

The volume of a squircle can be found in closed form, and I’ve mentioned the equation a few times, for example here. The perimeter, however, cannot be found in closed form, as far as I know, except for special exponents.

We can solve for y as a function of x and find the arclength using the formula taught in calculus courses. However, the derivative of y has a singularity at x = 1. By switching to polar coordinates, we can find arclength in terms of an integrand with no singularities. We can also simplify things a little by computing the total arclength as 4 times the arclength in the first quadrant; this avoids having to take absolute values.

The following Python code computes the perimeter and the ratio of the area to the perimeter.

    from scipy import sin, cos, pi
    from scipy.integrate import quad
    from scipy.special import gamma
    
    def r(t, p):
        return (cos(t)**p + sin(t)**p)**-(1/p)
    
    def drdt(t, p):
        deriv = (cos(t)**p + sin(t)**p)**(-1-1/p)
        deriv *= cos(t)**(p-1)*sin(t) - sin(t)**(p-1)*cos(t)
        return deriv
    
    def integrand(t, p):
        return (drdt(t,p)**2 + r(t,p)**2)**0.5
    
    def arclength(p):
        integral = quad(lambda t: integrand(t,p), 0, pi/2)[0]
        return 4*integral
    
    def area(p):
        return 4*gamma(1 + 1/p)**2 / gamma(1 + 2/p)
    
    def ratio(p):
        return area(p) / arclength(p)

Basic geometry tells us the ratio is 1/2 when p = 2 and we have a circle. The ratio is also 1/2 when p = ∞, i.e. when we have a square. We can use the code above to find that the ratio when p = 0.528627, so there is at least one local maximum for values of p between 2 and ∞.

Here’s a plot of the ratio of area to perimeter as a function of p.

ratio of area to perimeter for squircle

The plot is quite flat for large values of p, but if we zoom in we can see that there is a local maximum near 4.4.

close up of previous graph near the maximum

When I find the maximum of the ratio function using Brent’s method (scipy.optimize.brent) I find p = 4.39365679, which agrees with the value Panholzer stated.

Here’s a plot of the squircle corresponding to this value of p.

squircle with largest area to perimeter ratio

Back to the isoparametric problem

Now why doesn’t this contradict the isoparametric problem? Area scales quadratically but perimeter scales linearly. If you don’t hold perimeter constant, you can find a larger ratio of area to volume by making the perimeter larger. And that’s what we’ve done. When p = 2, we have a unit circle, with area π and perimeter 2π. When p = 4.39365679 we have area 3.750961567 and permimeter 7.09566295. If we were to take a circle with the same perimeter, it would have area 4.00660097, larger than the squircle we started with.

Related posts

An infinite product challenge

Gil Kalai wrote a blog post yesterday entitled “Test Your Intuition (or knowledge, or programming skills) 36.” The challenge is to evaluate the infinite product

\prod_{p\,\, \mathrm{prime}} \frac{p^2+1}{p^2 - 1}

I imagine there’s an elegant analytical solution, but since the title suggested that programming might suffice, I decided to try a little Python. I used primerange from SymPy to generate the list of primes up to 200, and cumprod from NumPy to generate the list of partial products.

    cumprod(
        [(p*p+1)/(p*p-1) for p in primerange(1,200)]
    )

Apparently the product converges to 5/2, and a plot suggests that it converges very quickly.

Plot of partial products

Here’s another plot to look more closely at the rate of convergence. Here we look at the difference between 5/2 and the partial products, on a log scale, for primes less than 2000.

Plot of 2.5 minus partial products, log scale