An interesting recursion

The previous post concerned finding the inverse of a function with respect to Dirichlet convolution. Given a function f defined on the positive integers, we want to find a function g, also defined on positive integers, such that f*g = δ. This means we want to find a function g such that the sum

(f*g)(n) = \sum_{d|n} f(d) \,\,g\left(\frac{n}{d} \right)

evaluates to 1 if n = 1 and 0 otherwise.

We said in that post that the function g can be defined recursively by

g(1) =\frac{1}{f(1)}

and

g(n) = -\frac{1}{f(1)} \sum_{\stackrel{d > 1}{d|n}} f(d)\,\,g\left(\frac{n}{d} \right )

for n > 1.

Python code

We can implement this directly in Python as a recursive function. Starting with any function f, we can define g exactly as above. For simplicity we set f(n) = n in the code below.

    from sympy import divisors

    def f(n):
        return n

    def g(n):
        if n == 1:
            return 1/f(1)
        s = sum(f(d)*g(n//d) for d in divisors(n)[1:])
        return -s/f(1)

The function sympy.divisors returns a list of divisors of a number, sorted in increasing order. We use [1:] to exclude the 0th divisor, i.e. 1, to get the list of divisors greater than 1. The code terminates because d > 1, and so n/d < n.

Note that the code uses integer division in g(n//d). Since we’re dividing n by one of its divisors, n/d is (mathematically) an integer, but the Python code n/d would return a value of type float and cause Python to complain that an argument to g is not an integer.

What’s interesting about this recursion is the set of cases g is defined by. Often a recursive function of n is defined in terms of its value at n-1, as in factorial, or its values at consecutive previous values, as in Fibonacci numbers. But here the value of g at n depends on the the values of g at factors of n.

So, for example, if we evaluate g(12), the code will evaluate g at 1, 2, 3, 4, and 6. In the process of computing g(12) we don’t need to know the value of g at 11.

Caching

Note that there’s some redundancy in the execution of our code. When we evaluate g at 4, for example, it evaluates g at 2, which had already been calculated. We could eliminate this redundancy by using memoization to cache computed values. In Python this can be done with the lru_cache decorator from functools.

    import functools

    @functools.lru_cache(maxsize=None)
    def g(n):
        if n == 1:
            return 1/f(1)
        s = sum(f(d)*g(n//d) for d in divisors(n)[1:])
        return -s/f(1)

If we use the revised code to calculate g(12), the call to g(6), for example, will look up the previously computed values of g(1), g(2), and g(3) from a cache.

If we evaluate g at several values, the code will fill in its cache in a scattered sort of way. If our first call is to evaluate g(12), then after that call the values of g at 1, 2, 3, 4, and 6 will be cached, but there will be no value of g at 7, for example, in the cache. If we then evaluate g(7), then this value will be cached, but there’s no value of g at 5 in the cache, and there won’t be until after we evaluate g at a multiple of 5.

Run time

Suppose the cache contains g(x) for all x < n and you want to compute g(n). How long would that take?

For large n, the number of divisors of n is on average about log(n), so the algorithm requires O(log n) look ups. For more detail, see the Dirichlet divisor problem. If you wanted to be more detailed in your accounting, you’d need to take into account the time required to produce the list of n‘s divisors, and the time required to look up cached values. But as a first approximation, the algorithm runs in O(log n) time.

I think the same is true even if you start with an empty cache. It would take longer in this case, but I think this would only change the size of the constant in front of log n.

Related posts

Guide to the recent flurry of posts

I wrote six blog posts this weekend, and they’re all related. Here’s how.

Friday evening I wrote a blog post about a strange random number generator based on factorials. The next day my electricity went out, and that led me to think how I would have written the factorial RNG post without electricity. That led to two threads: interpolation in tables and calculations with floors.

The interpolation thread has two posts. The first looks at error estimates for polynomial interpolation, and shows that numerical error can easily dominate approximation error in interpolation. The second looks shows that interpolation for tables of sines and cosines can be better than interpolation in general.

The floor calculation thread first lead to this post which states a useful theorem from Concrete Mathematics and uses a slight generalization of the theorem to justify a numerical calculation. Then a comment on an earlier post led to a new post giving simple and tight bounds on the number of trailing zeros in a factorial.

You can read each of these posts by itself, but I thought some people would appreciate seeing how they fit together.

Filling in gaps in a trig table

The previous post shows how you could use linear interpolation to fill in gaps in a table of logarithms. You could do the same for a table of sines and cosines, but there’s a better way. As before, we’ll assume you’re working by hand with just pencil, paper, and a handbook of tables.

Linear interpolation

Suppose you want to find the sine of 12.3456° and you have a table of sines for angles in increments of 0.1°. In Table 4.10 of A&S we find

sin 12.3° = 0.21303 03862 74977
sin 12.4° = 0.21473 53271  67063

If we were to use linear interpolation, we’d estimate

sin 12.3456° = sin 12.3° + 0.456(sin 12.4° – sin 12.3°) = 0.21380 78393 21768

which is accurate to six decimal places.

Better aproach

Another approach would be to use the identity

sin(θ + φ) = sin θ cos φ + cos θ sin φ

rather than linear interpolation, setting θ = 12.3° and φ = 0.0456°. We can look up the sine and cosine of θ in our table, but how do we find the sine and cosine of φ?

The cosine is easy: set it to 1. For a small angle x (in radians) the cosine of x is approximately 1 with an error of less than x²/2. In radians,

φ = 0.0456 π/180 = 0.00079 58701 38909

and so the truncation error in approximating cos φ with 1 is about 3×10-7.

Computing the sine of φ is easy, but it requires converting φ to radians. You could probably find the conversion factor in your handbook, e.g. in Table 1.1 of A&S.

0.0456° = 0.0456 × 0.01745 32925 19943

Once φ is in radians, sin φ = φ with an error of less than φ³/6 (see here).

Putting the pieces together we have

sin(θ + φ) = sin 12.3° × 1 + cos 12.3° × φ

which, using the numbers above, gives us 0.21380785249034476, which is off by about 6×10-8.

More accuracy

If we want even more accuracy, we need to find the weakest link in our calculation. The error in approximating sin φ as φ is on the order of φ³ while the error in approximating cos φ as 1 is on the order of φ², so the latter is the largest source of error.

If we approximate cos φ as 1 – φ²/2, the error is on the order of φ4, and the weakest link would be the sine approximation which has error on the order of φ³, which is still quite small. The overall error in computing sin 12.3456° would be less than 10-10 if we use this higher order approximation for cosine φ.

Compare and contrast

Let’s go back to approximating the cosine of a small angle by 1 and compare the two approximation approaches above.

Linear interpolation:

sin 12.3456° = sin 12.3° + 0.456(sin 12.4° – sin 12.3°)

Addition formula:

sin 12.3456° = sin 12.3° + 0.0456 (π/180) (cos 12.3°)

The the second terms in the two approaches are

0.0456(sin 12.4° – sin 12.3°)/0.1

and

0.0456 (π/180) (cos 12.3°).

The two are similar because

(sin 12.4° – sin 12.3°)/0.1 ≈ (π/180) (cos 12.3°).

The term on the left is the difference quotient for sine at 12.3° with step h = 0.1 and the term on the right is the derivative of sine at 12.3°.

Wait, isn’t the derivative of sine just cosine? It is when you’re working in radians, which is why calculus almost always uses radians, but when you work in degrees, the derivative of sine is π/180 times cosine.

What this shows is that if you approximate cosines of small angles as zero, the sum formula reduces to a one-term Taylor approximation.

Tables and interpolation

The previous post about hand calculations involved finding the logarithm of a large integer using only tables.

We wanted to know the log base 10 of 8675310 and all we had was a table of logarithms of integers up to 1350. We used

log10 867 = 2.9380190975
log10 868 = 2.9385197252

and linear interpolation to estimate

log10 867.5310 = 2.9382849308.

How accurate is this estimate? How might we get more accuracy?

Nobody looks up logarithms in a table any more, but interpolation is still commonly used to estimate the value of a function at points you haven’t computed (or measured).

Here’s a theorem that will let us estimate the error in our interpolation. See, for example, [1].

Let f be a function with n+1 continuous derivatives on [a, b] with

|f^{(n+1)}(x)| \leq M

on [a, b]. Let p be the polynomial of degree at most n that interpolates f at n+1 evenly spaced nodes on [a, b]. Then

|f(x) - p(x)| \leq \frac{M}{4(n+1)} \left( \frac{b-a}{n}\right)^{n+1}

In our case the function we’re interpolating is

f(x) = log10 x = log(x)/log(10).

We have a = 867, b = 868, and n = 1. The second derivative of f is given by

f ” (x) = – x-2/log(10)

and so the upper bound M in our theorem is

1/(8672 log(10)) = 5.7776 × 10-7.

It follows that the theoretical bound on our error is 7.222×10-7. The actual error is 7.186×10-7, close to the theoretical bound.

If we use quadratic interpolation at three points rather than linear interpolation at two points, our theoretical error bound becomes smaller because M is smaller: the third derivative of f involves x-3 rather than x-2. However, we’re unlikely to actually reduce our error much because of numerical error.

Interpolation involves subtracting function values at each of the interpolation points. We know these values to 11 significant figures, but they agree to 4 significant figures, so we can only expect 7 significant figures in their differences. This means that the linear interpolation error is on the same order of magnitude as numerical error. Higher order interpolation is unlikely to improve computational accuracy even though it reduces approximation error because numerical error becomes the limiting factor.

More interpolation posts

[1] Numerical Methods and Computing by Ward Cheney and David Kincaid.

Much ado about NaN

I ran across a GitHub repo today that features an amusing hack using the sign bit of NaNs for unintended purposes. This is an example of how IEEE floating point numbers have a lot of leftover space devoted to NaNs and infinities. However, relative to the enormous number of valid 64-bit floating point numbers, this waste is negligible.

But when you scale down to low-precision floating point numbers, the overhead of the strange corners of IEEE floating point becomes significant. Interest in low-precision floating point comes from wanting to pack more numbers into memory at the same time when you don’t need much precision. Floating point numbers have long come in 64 bit and 32 bit varieties, but now there are 16 bit and even 8 bit versions.

There are advantages to using completely new floating point formats for low precision numbers rather than scaling down the venerable IEEE format. Posit numbers have only one special number, a point at infinity. Every other bit pattern corresponds to a real number. Posits are also more usefully distributed, as illustrated in the image below, taken from here.

 

eight bit IEEE distribution

eight bit posit distribution

IEEE float posts

Posit number posts

Planetary code golf

Suppose you’re asked to write a function that takes a number and returns a planet. We’ll number the planets in order from the sun, starting at 1, and for our purposes Pluto is the 9th planet.

Here’s an obvious solution:

    def planet(n):
        planets = [
            "Mercury",
            "Venus",
            "Earth",
            "Mars",
            "Jupiter",
            "Saturn",
            "Uranus",
            "Neptune",
            "Pluto"
        ]
        # subtract 1 to correct for unit offset
        return planets[n-1] 

Now let’s have some fun with this and play a little code golf. Here’s a much shorter program.

    def planet(n): return chr(0x263e+n)

I was deliberately ambiguous at the top of the post, saying the code should return a planet. The first function naturally interprets that to mean the name of a planet, but the second function takes it to mean the symbol of a planet.

The symbol for Mercury has Unicode value U+263F, and Unicode lists the symbols for the planets in order from the sun. So the Unicode character for the nth planet is the character for Mercury plus n. That would be true if we numbered the planets starting from 0, but conventionally we call Mercury the 1st planet, not the 0th planet, so we have to subtract one. That’s why the code contains 0x263e rather than 0x263f.

We could make the function just a tiny bit shorter by using a lambda expression and using a decimal number for 0x263e.

    planet = lambda n: chr(9790+n)

Display issues

Here’s a brief way to list the symbols from the Python REPL.

    >>> [chr(0x263f+n) for n in range(9)]
    ['☿', '♀', '♁', '♂', '♃', '♄', '♅', '♆', '♇']

You may see the symbols for Venus and Mars above rendered with a colored background, depending on your browser and OS.

Here’s what the lines above looked like at my command line.

screen shot of repl

Here’s what the symbols looked like when I posted them on Twitter.

symbols for planets

For reasons explained here, some software interprets some characters as emoji rather than literal characters. The symbols for Venus and Mars are used as emoji for female and male respectively, based on the use of the symbols in biology. Incidentally, these symbols were used to represent planets long before biologists used them to represent sexes.

Related posts

Beta inequalities with integer parameters

Suppose X is a beta(a, b) random variable and Y is a beta(cd) random variable. Define the function

g(a, b, c, d) = Prob(X > Y).

At one point I spent a lot of time developing accurate and efficient ways to compute g under various circumstances. I did this because when I worked at MD Anderson Cancer Center we spent thousands of CPU-hours computing this function inside clinical trial simulations.

For this post I want to outline how to compute g in a minimal environment, i.e. a programming language with no math libraries, when the parameters a, b, c, and d are all integers. The emphasis is on portability, not efficiency. (If we had access to math libraries, we could solve the problem for general parameters using numerical integration.)

In this technical report I develop several symmetries and recurrence relationships for the function g. The symmetries can be summarized as

g(a, b, c, d) = g(d, c, b, a) = g(d, b, c, a) = 1 − g(c, d, a, b).

Without loss of generality we assume a is the smallest argument. If not, apply the symmetries above to make the first argument the smallest.

We will develop a recursive algorithm for computing the function g by recurring on a. The algorithm will terminate when a = 1, and so we want a to be the smallest parameter.

In the technical report cited above, I develop the recurrence relation

g(a + 1, b, c, d) = g(a, b, c, d) + h(a, b, c, d)/a

where

h(a, b, c, d) = B(a + c, b + d) / B(a, b) B(c, d)

and B(x, y) is the beta function.

The equation above explains how to compute g(a + 1, b, c, d) after you’ve already computed g(a, b, c, d), which is what I often needed to do in applications. But we want to do the opposite to create a recursive algorithm:

g(a, b, c, d) = g(a – 1, b, c, d) + h(a – 1, b, c, d)/(a – 1)

for a > 1.

For the base case, when a = 1, we have

g(1, b, cd) = B(c, b + d) / B(c, d).

All that’s left to develop our code is to evaluate the Beta function.

Now

B(x, y) = Γ(x + y) / Γ(x) Γ(y)

and

Γ(x)  = (x – 1)!.

So all we need is a way to compute the factorial, and that’s such a common task that it has become a cliché.

The symmetries and recurrence relation above hold for non-integer parameters, but you need a math library to compute the gamma function if the parameters are not integers. If you do have a math library, the algorithm here won’t be the most efficient one unless you need to evaluate g for a sequence of parameters that differ by small integers.

***

For more on random inequalities, here’s the first of a nine-part sequence of blog posts.

Unix via etymology

There are similarities across Unix tools that I’ve seldom seen explicitly pointed out. For example, the dollar sign $ refers to the end of things in multiple contexts. In regular expressions, it marks the end of a string. In sed, it refers to last line of a file. In vi it is the command to move to the end of a line.

Think of various Unix utilities lined up in columns: a column for grep, a column for less, a column for awk, etc. I’m thinking about patterns that cut horizontally across the tools.

Tools are usually presented vertically, one tool at a time, and for obvious reasons. If you need to use grep, for example, then you want to learn grep, not study a dozen tools simultaneously, accumulating horizontal cross sections like plastic laid down by a 3D printer, until you finally know what you need to get your job done.

On the other hand, I think it would be useful to point out some of the similarities across tools that experienced users pick up eventually but seldom verbalize. At a minimum, this would make an interesting blog post. (That’s not what this post is; I’m just throwing out an idea.) There could be enough material to make a book or a course.

I used the word etymology in the title, but that’s not exactly what I mean. Maybe I should have said commonality. I’m more interested in pedagogy than history. There are books on Unix history, but that’s not quite what I have in mind. I have in mind a reader who would appreciate help mentally organizing a big pile of syntax, not someone who is necessarily a history buff.

If the actual etymology isn’t too complicated, then history and pedagogy can coincide. For example, it’s helpful to know that sed and vi have common syntax that they inherited from ed. But pseudo-history, a sort of historical fiction, might be more useful than factually correct history with all its rabbit trails and dead ends. This would be a narrative that tells a story of how things could have developed, even while acknowledging that the actual course of events was more complicated.

Related post: Command option patterns

Functions in bc

The previous post discussed how you would plan an attempt to set a record in computing ζ(3), also known as Apéry’s constant. Specifically that post looked at how to choose your algorithm and how to anticipate the number of terms to use.

Now suppose you wanted to actually do the calculation. This post will carry out a calculation using the Unix utility bc. I often use bc for extended precision calculation, but mostly for simple things [1].

Calculating Apéry’s constant will require a function to compute binomial coefficients, something not built into bc, and so this post will illustrate how to write custom functions in bc.

(If you wanted to make a serious attempt at setting a record computing Apéry’s constant, or anything else, you would probably use an extended precision library written in C MPFR or write something even lower level; you would not use bc.)

Apéry’s series

The series we want to evaluate is

\zeta(3) = \frac{5}{2} \sum_{n=1}^\infty (-1)^{n-1} \frac{1}{\dbinom{2n}{n} n^3}

To compute this we need to compute the binomial coefficients in the denominator, and to do that we need to compute factorials.

From calculations in the previous post we estimate that summing n terms of this series will give us 2n bits of precision, or 2n/3 decimal places. So if we carry out our calculations to n decimal places, that gives us more precision than we need: truncation error is much greater than numerical error.

bc code

Here’s the code, which I saved in the file zeta3.b. I went for simplicity over efficiency. See [2] for a way to make the code much more efficient.

    # inefficient but simple factorial
    define fact(x) {
        if (x <= 1)
            return (1);
        return (x*fact(x-1));
    }

    # binomial coefficient n choose k
    define binom(n, k) {
        return (fact(n) / (fact(k)*fact(n-k)));
    }

    define term(n) {
        return ((-1)^(n-1)/(n^3 * binom(2*n, n)))
    }

    define zeta3(n) {
        scale = n
        sum = 0
        for (i = 1; i <= n; ++i)
            sum += term(i);
        return (2.5*sum);
    }

Now say we want 100 decimal places of ζ(3). Then we should need to sum about 150 terms of the series above. Let’s sum 160 terms just to be safe. I run the code above as follows [3].

    $ bc -lq zeta3.b
    zeta3(160)

This returns

    1.202056903159594285399738161511449990764986292340498881792271555341\
83820578631309018645587360933525814493279797483589388315482364276014\
64239236562218818483914927

How well did we do?

I tested this by computing ζ(3) to 120 decimals in Mathematica with

    N[RiemannZeta[3], 120]

and subtracting the value returned by bc. This shows that the error in our calculation above is approximately 10-102. We wanted at least 100 decimal places of precision and we got 102.

Notes

[1] I like bc because it’s simple. It’s a little too simple, but given that almost all software errs on the side of being too complicated, I’m OK with bc being a little too simple. See this post where I used (coined?) the phrase controversially simple.

[2] Not only is the recursive implementation of factorial inefficient, computing factorials from scratch each time, even by a more efficient algorithm, is not optimal. The more efficient thing to do is compute each new coefficient by starting with the previous one. For example, once we’ve already computed the binomial coefficient (200, 100), then we can multiply by 202*201/101² in order to get the binomial coefficient (202, 101).

Along the same lines, computing (-1)^(n-1) is wasteful. When bootstrapping each binomial coefficient from the previous one, multiply by -1 as well.

[3] Why the options -lq? The -l option does two things: it loads the math library and it sets the precision variable scale to 20. I always use the -l flag because the default scale is zero, which lops off the decimal part of all floating point numbers. Strange default behavior! I also often need the math library. Turns out -l wasn’t needed here because we explicitly set scale in the function zeta3, and we don’t use the math library.

I also use the -q flag out of habit. It starts bc in quiet mode, suppressing the version and copyright announcement.

Planning a world record calculation

Before carrying out a big calculation, you want to have an idea how long various approaches would take. This post will illustrate this by planning an attempt to calculate Apéry’s constant

\zeta(3) = \sum_{n=1}^\infty \frac{1}{n^3}

to enormous precision. This constant has been computed to many decimal places, in part because it’s an open question whether the number has a closed form. Maybe you’d like to set a new record for computing it.

This post was motivated by a question someone asked me in response to this post: What the advantage is in having an approximation for binomial coefficients when we can just calculate them exactly? This post will show these approximations in action.

Apéry’s series

First of all, calculating ζ(3) directly from the definition above is not the way to go. The series converges too slowly for that. If you compute the sum of the first N terms, you get about 2 log10 N decimal places. This post explains why. We can do much better, with the number of decimal places achieved being proportional to N rather than log N.

A couple days ago I wrote about a sequence rapidly converging series for ζ(3), starting with Apéry’s series.

\zeta(3) = \frac{5}{2} \sum_{n=1}^\infty (-1)^{n-1} \frac{1}{\dbinom{2n}{n} n^3}

Suppose you want to compute ζ(3) to a million decimal places using this series. How would you do this?

Since this is an alternating series, the truncation error from summing only the first N terms is bounded by the N+1 term. So you could sum terms until you get a term less than

10-106.

Great, but how long might that take?

Without some estimate of binomial coefficients, there’s no way to know how long this would take without computing a bunch of binomial coefficients. But with a little theory, you can do a back-of-the-envelope estimate.

Estimating how many terms to use

For large n, the binomial coefficient in the denominator of the nth term is approximately 4n/√(πn). So when you include the n³ term, you can see that the denominator of the nth term in the series is greater than 4n = 22n. In other words, each term gives you at least 2 bits of precision.

If you want 1,000,000 digits, that’s approximately 3,000,000 bits. So to get a million digits of ζ(3) you’d need to sum about 1,500,000 terms of Apéry’s series. By comparison, summing 1,500,000 terms in the definition of ζ(3) would give you between 12 and 13 digits precision.

More efficient series

Apéry’s series is only the first of a sequence of rapidly converging series for ζ(3). The series get more complicated but more efficient as a parameter s increases. Is it worthwhile to use a larger value of s? If you were planning to devote, say, a month of computing resources to try to set a record, it would be a good idea to spend a few hours estimating whether another approach would get more work done in that month.

By combining information here and here you could find out that the series corresponding to s = 2 has terms whose denominator is on the order of 33n. To estimate how many terms you’d need for a million digits of precision, you’d need to solve the equation

33n = 10106,

and by taking logs you find this is 1,047,951. In other words, about 1/3 fewer terms than using Apéry’s series. That was a win, so maybe you go on to estimate whether you should go up to s = 3. The key information you need in these calculations is the estimate on binomial coefficients given here.

Incidentally, computing ζ(3) to a million decimal places would have given you the world record in 1997. At the moment, the record is on the order of a trillion digits.

Update: The next post shows how to actually carry out (on a smaller scale) the calculations described here using the Unix utility bc.