A bevy of ones

Take any positive integer d that is not a multiple of 2 or 5. Then there is some integer k such that d × k has only 1’s in its decimal representation. For example, take d = 13. We have

13 × 8457 = 111111.

Or if we take d = 27,

27 × 4115226337448559670781893 = 111111111111111111111111111.

Let’s change our perspective and start with the string of 1’s. If d is not a multiple of 2 or 5, then there is some number made up of only 1’s that is divisible by d. And in fact, the number of 1’s is no more than d.

This theorem generalizes to any integer base b > 1. If d is relatively prime to b, then there is a base b number with d or fewer “digits” which is divisible by d [1].

The following Python code allows us to find the number k such that d × k has only 1’s in its base b representation, provided k is relatively prime to b. It returns the number of 1’s we need to string together to find a multiple of k. If k shares a factor with b, the code returns 0 because no string of 1’s will ever be divisible by k.

    from math import gcd
    
    def all_ones(n, b = 10):
        return sum(b**i for i in range(n))
    
    def find_multiple(k, b = 10):
        if gcd(k, b) > 1:
            return 0
        for n in range(1, k+1):
            if all_ones(n, b) % k == 0:
                return n

The two Python functions above default to base 10 if a base isn’t provided.

We could find a multiple of 5 whose hexadecimal representation is all 1’s by calling

    print(find_multiple(5, 16))

and this tells us that 11111sixteen is a multiple of 5, and in fact

5 × 369dsixteen = 11111sixteen.

[1] Elmer K. Hayashi. Factoring Integers Whose Digits Are all Ones. Mathematics Magazine, Vol. 49, No. 1 (Jan., 1976), pp. 19-22

Symbol pronunciation

I was explaining to someone this evening that I’m in the habit of saying “bang” rather than “exclamation point.” Here’s a list of similar nicknames for symbols.

These nicknames could complement the NATO phonetic alphabet if you needed to read symbols out loud, say over the phone. You might, for example, pronounce “HL&P” as “Hotel Lima Pretzel Papa.”

Or you might use them to have one-syllable names for every symbol. This is a different objective than maximizing phonetic distinctiveness. For example, referring to # and $ as “hash” and “cash” is succinct, but could easily be misheard.

You could also optimize for being clever. Along those lines, I like the idea of pronouncing ( and ) as “wane” and “wax”, by analogy with phases of the moon, though this would be a bad choice if you want to be widely and quickly understood.

Even if someone understood the allusion to lunar phases, and they knew which one looks like an opening parenthesis and which one looks like a closing parenthesis, they might get your meaning backward because they don’t live in the same hemisphere you do! I implicitly assumed the perspective of the northern hemisphere in the paragraph above. Someone from the southern hemisphere would understand “wane” to mean ) and “wax” to mean (.

Phases of the moon in northern and southern hemispheres

Is there a zip code that equals its population?

US stamp from 1973 promoting zip codes

I noticed yesterday that the population in a zip code near me is roughly equal to the zip code itself. So I wondered:

Does any zip code equal its population?

Yes, it’s a silly question. A zip code isn’t a quantity. Populations are always changing. Zip code boundaries are always changing. Etc.

The answer, according to the data I had on hand, is almost.

Smallest absolute error: Zip code 00674 has population 672.

Smallest relative error: Zip code 42301 has population 42319.

I’ve had to learn some of the intricacies of zip codes in the course of my work on data privacy. I found out that zip codes are more complicated than I ever would have thought.

For one thing, the US Census doesn’t exactly report data by zip code but by zip code tabulation area (ZCTA) for reasons that make sense but are too complicated to get into here. This is another reason why the question posed here is fuzzy; we don’t know the populations of zip codes unless they coincide with ZCTAs.

Update: Several people have told me they expected this post to be an existence argument, such as saying by the pigeon hole principle some zip code has to equal its population. That’s not the case, at least for real, populated US zip codes. (Though Andrew Gelman pointed out that 00000 is not populated, so there’s that.)

Let P be a population size with n zip codes and assume all zip codes have to be populated. If P = n and one of the zip codes must be numbered 1, then some zip code must equal its population. But other than trivial cases like that, it’s easy to avoid any equalities between zip codes and their populations.

A more interesting question goes in the opposite direction: is it possible that every zip code equals its population? Clearly not if n(n + 1)/2 > P because the left side is the minimum population possible with n zip codes, each equal to its non-zero population. But it turns out that n(n + 1)/2 ≤ P is sufficient: put one person in zip codes 1 through n-1 and let k = Pn(n-1) be the number of remaining people. Put them all in zip code k. Since kn-1, k is a zip code that hasn’t been used before.

More on zip codes

This post started out as a Twitter thread.

The image above is a US stamp from 1973 promoting the initial use of zip codes.

Fibonacci numbers and ingrown bark

The previous post looked at the images of concentric circles under functions defined by power series. The terms of these series have the form

zθ(n) / θ(n)

where θ(n) is a rapidly increasing function of n. These series are thin (technically, lacunary) because all the terms between values of θ(n) are missing.

The previous post used factorials and powers of 2 as the thinning function. This post uses Fibonacci numbers. It differs from the previous post by using more circles and by making all the circles the same color rather than using matplotlib’s default colors.

Without further ado, here’s the plot.

It looks sorta like a tree with ingrown bark.

The code to produce the plot was the same as the code in the previous post with the following changes.

    def fib(n):
        phi = (1 + 5**0.5)/2
        return int(phi**n/5**0.5 + 0.5)

    for r in linspace(0, 1, 60):
        z = f(fib, r*exp(1j*theta))
        plt.plot(z.real, z.imag, 'b', alpha=0.6)

The function fib above uses Binet’s formula to compute Fibonacci numbers.

Concentric circle images go wild

HAKMEM Item 123 gives two plots, both images of concentric circles under functions given by power series with rapidly thinning terms [1]. The first is the function

f(z) = \sum_{n=1}^\infty \frac{z^{n!}}{n!}

and the second is

 g(z) = \sum_{n=1}^\infty \frac{z^{2^n}}{2^n}

The lower limits of summation are not given in the original. I assumed at first the sums began at n = 0, but my plots didn’t match the original until I set the lower index to begin at n = 1.

Here are the plots. First, the image of the circles of radius 1/2, 3/4, 7/8, and 1 under f.

Next, the images of the circles of radius 1/8, 2/8, …, 8/8 under g.

Note that the images of concentric circles are smooth, concentric, not self-intersecting for small radii. As the radii get larger, the images of the circles start to intersect. And when the radius is 1, the images are rough and reminiscent of Mandelbrot sets.

Here’s the Python code that produced the plots.

    from math import factorial
    from numpy import linspace, exp, pi
    import matplotlib.pyplot as plt
    
    def f(thin, z):
        return sum(z**thin(n)/thin(n)
                   for n in range(1, 12))
    
    theta = linspace(0, 2*pi, 1000)
    
    for r in [0.5, 0.75, 0.875, 1]:
        z = f(factorial, r*exp(1j*theta))
        plt.plot(z.real, z.imag)
    plt.axes().set_aspect(1)
    plt.title("f(z)")
    plt.show()
    plt.close()
    
    for n in range(8):
        r = (n+1)/8
        z = f(lambda n: 2**n, r*exp(1j*theta))
        plt.plot(z.real, z.imag)
    plt.title("g(z)")
    plt.axes().set_aspect(1)
    plt.show()

Update 1: Out of curiosity, I unrolled the plots to look at the real and imaginary components separately. Here’s what I got.

Update 2: Here’s a plot of the arclength of the image of a circle of radius r under the map g(z).

The length of the image curve is nearly proportional to the circumference of the input until r around 0.8, then the length increases quickly. Apparently the arc length goes to infinity as r goes to 1, though I haven’t proved this.

[1] See the footnote on lacunary series here.

Accelerating an alternating series

This post looks at an algorithm by Cohen et al [1] to accelerate the convergence of an alternating series. This method is much more efficient than the classical Euler–Van Wijngaarden method.

For our example, we’ll look at the series

\sum_{k=1}^\infty \frac{(-1)^k}{k^2}

which converges slowly to -π²/12.

The first algorithm in [1] for approximating

\sum_{k = 1}^\infty (-1)^k a_k

using up to the nth term is given by the following Python code.

    def cohen_sum(a, n):
        d = (3 + 8**0.5)**n
        d = (d + 1/d)/2
        b = -1
        c = -d
        s = 0
        for k in range(n):
            c = b - c
            s += c*a[k]
            b = (k+n)*(k-n)*b/((k + 0.5)*(k+1))
    return s/d

Two notes: First, the algorithm assumes the index of summation starts at zero and so we’ll shift our sum to start at zero. We could just define the first term of the sum to be zero and leave the rest of the series alone, but this would produce worse results; it leads to an initial jump in the series that makes the extrapolation in the method less effective. Second, the alternating term (-1)k is not part of the array passed into the function.

Two graphs, especially the second, will illustrate how well the method of Cohen et al performs compared to the direct method of simply taking partial sums. First, here are the sequence of approximations to the final sum on a linear scale.

And here are the errors in the two methods, plotted on a log scale.

The error from using the direct method of computing partial sums is on the order of 1/n² while the error from using the accelerated method is roughly 1/20.7n. In this example, it would take around 30,000 terms for the direct method to achieve the accuracy that the accelerated method achieves with 10 terms.

Note that accelerating convergence can be a delicate affair. Different acceleration methods are appropriate for different sums, and applying the wrong method to the wrong series can actually slow convergence as illustrated here.

More on series acceleration

[1] Henri Cohen, Fernando Rodriguez Villegas, and Don Zagier. Convergence Acceleration of Alternating Series. Experimental Mathematics 9:1, page 3

Compact form of the Lagrange inversion formula

The Lagrange inversion formula can be used to find the power series for the inverse of a function. I wrote about a different approach this problem a couple years ago, that time using Bell polynomials. This time I’ll give a formula that is more direct and easier to remember.

Suppose we have a function A(x) and can compute its derivatives. We want to find a power series for B(x) where B(A(x)) = x. We assume A(0) = 0 and A‘(0) ≠ 0.

The kth coefficient in the power series for B(x) is given by

b_k = \frac{1}{k} [k-1] \left(\frac{x}{A(x)}\right)^k

where [k – 1] in front of a function means to take the (k-1)st coefficient in its power series.

Let’s apply this to get the first few terms of the series for tangent. Since inverse tangent has a simpler power series than tangent, we’ll set A(x) = arctan(x) so that B(x) is tangent, i.e. the inverse of the inverse tangent.

Of course we could just find the power series for tangent directly, and this is just a demonstration. Power series inversion is more useful when you can’t simply find the series for the inverse function directly.

We will compute the coefficients b1, b3, and b5 to get a 5th order series for tangent. Why don’t we need to compute b2 and b4? Because tangent is an odd function, we know that its power series coefficients with even indices are zero.

(You can see that this happens in general by looking at the equation above. If A(x) is an odd function, then x / A(x) is even, and so are its kth powers. The coefficients with odd index in the power series for an even function are zero.)

\begin{align*} \left(\frac{x}{\arctan(x)}\right)^1 &= 1 + \frac{1}{3}x^2 - \frac{1}{45}x^4 + \cdots \\ \left(\frac{x}{\arctan(x)}\right)^3 &= 1 + \phantom{\frac{1}{3}}x^2 + \frac{1}{15}x^4 + \cdots \\ \left(\frac{x}{\arctan(x)}\right)^5 &= 1 + \frac{5}{3}x^2 + \frac{2}{15}x^4 + \cdots \end{align*}

So b1 equals the 0th coefficient in the power series for x/arctan(x), which is 1.

Next b3 equals 1/3 times the 2nd coefficient in the power series for (x/arctan(x))3, and so b3 = 1/3.

Finally, b5 equals 1/5 times the 4th coefficient in the power series for (x/arctan(x))5, and so b3 = 2/15.

This tells us the power series for tangent is given by

\tan(x) = x + \frac{1}{3} x^3 + \frac{2}{15}x^5 + \cdots

and we could check by computing the power series directly that these terms are correct.

Incidentally, we can extend the formula at the top of this post to include powers of the inverse function. That is, the coefficients in the power series for B(x)n are given by

 [k] ((B(x)^n) = \frac{n}{k} [k-n] \left(\frac{x}{A(x)}\right)^k [k] ((B(x)^n) = \frac{n}{k} [k-n] \left(\frac{x}{A(x)}\right)^k

which reduces to the formula up top when n = 1.

Continued fractions of square roots

Let’s look at the continued fraction representation for √14.

\sqrt{14} = 3 + \cfrac{1}{1+ \cfrac{1}{2+ \cfrac{1}{1+ \cfrac{1}{6+ \cfrac{1}{1+ \cfrac{1}{2+ \cfrac{1}{1+ \cfrac{1}{6+ \ddots}}}}}}}}

If we were to take more terms, the sequence of denominators would repeat:

1, 2, 1, 6, 1, 2, 1, 6, 1, 2, 1, 6, …

We could confirm this with Mathematica:

    
    In:  ContinuedFraction[Sqrt[14], 13]
    Out: {3, 1, 2, 1, 6, 1, 2, 1, 6, 1, 2, 1, 6}

For any integer d that is not a perfect square, the continued fraction for √d has a pattern that we see above.

  1. The coefficients after the first one are periodic.
  2. The cycle consists of a palindrome followed by a single number.
  3. The last number in the cycle is twice the leading coefficient.

In the example above, the periodic part is {1, 2, 1, 6}. The palindrome is {1, 2, 1} and 6 is twice the initial coefficient 3.

Another way to state the third point above is to say that the leading coefficient is the integer part of the square root of d, i.e. ⌊√d⌋, and the last coefficient in each period is 2⌊√d⌋.

Annotated version of continued fraction for sqrt(14)

Sometimes the palindrome is empty:

    In:  ContinuedFraction[Sqrt[5], 10]
    Out: {2, 4, 4, 4, 4, 4, 4, 4, 4, 4}

In the continued fraction for √5 the palindrome part is empty and we repeat 4, twice the initial coefficient.

For √3, the palindrome is simply {1} and the final number is 2.

    In:  ContinuedFraction[Sqrt[3], 13]
    Out: {1, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2}

One last example:

    In:  ContinuedFraction[Sqrt[71], 17]
    Out: {8, 2, 2, 1, 7, 1, 2, 2, 16, 2, 2, 1, 7, 1, 2, 2, 16}

More continued fractions posts

Conceptual vs Numerical

One of the things that makes numerical computation interesting is that it often reverses the usual conceptual order of things, using advanced math to compute things that are introduced earlier.

Here’s an example I recently ran across [1]. The hyperbolic functions are defined in terms of the exponential function:

\begin{align*} \sinh(x) &= \frac{\exp(x) - \exp(-x)}{2} \\ \cosh(x) &= \frac{\exp(x) + \exp(-x)}{2} \\ \tanh(x) &= \frac{\exp(x) - \exp(-x)}{\exp(x) + \exp(-x)} \\ \end{align*}

But it may be more efficient to compute the exponential function in terms of hyperbolic functions. Specifically,

\exp(x) = \sinh(x) + \sqrt{\sinh(x)^2 + 1}

Why would you want to compute the simple thing on the left in terms of the more complicated thing on the right? Because hyperbolic sine is an odd function. This means that half its power series coefficients are zero: an odd function only has odd powers in its power series.

Suppose you need to compute the exponential function in an environment where you only have a limited number of registers to store constants. You can get more accuracy out of the same number of registers by using them to store coefficients in the power series for hyperbolic sine than for exp.

If we store n coefficients for sinh, we can include powers of x up to 2n – 1. And since the coefficient of x2n is zero, the error in our Taylor approximation is O(x2n+1). See this post for more on this trick.

If we stored n coefficients for exp, we could include powers of x up to n-1, and our error would be O(xn).

To make things concrete, suppose n = 3 and x = 0.01. The error in the exp function would be on the order of 10-6, but the error in the sinh function would be on the order of 10-14. That is, we could almost compute exp to single precision and sinh to almost double precision.

(I’m glossing over a detail here. Would you really need to store, for example, the 1 coefficient in front of x in either series? For simplicity in the argument above I’ve implicitly said yes. Whether you’d actually need to store it depends on the details of your implementation.)

The error estimate above uses big-oh arguments. Let’s do an actual calculation to see what we get.

    from math import exp, sinh, sqrt
    
    def exp_taylor(x):
        return 1 + x + 0.5*x**2
    
    def sinh_taylor(x):
        return x + x**3/6 + x**5/120
    
    def exp_via_sinh(x):
        s = sinh_taylor(x)
        return s + sqrt(s*s + 1)
    
    def print_error(approx, exact):
        print(f"Computed: {approx}")
        print(f"Exact:    {exact}")
        print(f"Error:    {approx - exact}")
    
    x = 0.01
    approx1 = exp_taylor(x)
    approx2 = exp_via_sinh(x)
    exact   = exp(x)
    
    print("Computing exp directly:\n")
    print_error(approx1, exact)
    
    print()
    
    print("Computing exp via sinh:\n")
    print_error(approx2, exact)

This produces

    Computing exp directly:

    Computed: 1.0100500000000001
    Exact:    1.010050167084168
    Error:    -1.6708416783473012e-07

    Computing exp via sinh:

    Computed: 1.0100501670841682
    Exact:    1.010050167084168
    Error:    2.220446049250313e-16

Our errors are roughly what we expected from our big-oh arguments.

More numerical analysis posts

[1] I saw this identity in Matters Computational: Ideas, Algorithms, Source Code by Jörg Arndt.

Time spent on the moon

Lunar module and lunar rover, photo via NASA

This post will illustrate two things: the amount of time astronauts have spent on the moon, and how to process dates and times in Python.

I was curious how long each Apollo mission spent on the lunar surface, so I looked up the timelines for each mission from NASA. Here’s the timeline for Apollo 11, and you can find the timelines for the other missions by making the obvious change to the URL.

Here are the data on when each Apollo lunar module touched down and when it ascended.

    data = [
        ("Apollo 11", "1969-07-20 20:17:39", "1969-07-21 17:54:00"),
        ("Apollo 12", "1969-11-19 06:54:36", "1969-11-20 14:25:47"),
        ("Apollo 14", "1971-02-05 09:18:13", "1971-02-06 18:48:42"),
        ("Apollo 15", "1971-07-30 22:16:31", "1971-08-02 17:11:23"),
        ("Apollo 16", "1972-04-21 02:23:35", "1972-04-24 01:25:47"),
        ("Apollo 17", "1972-12-11 19:54:58", "1972-12-14 22:54:37"),
    ]

Here’s a first pass at a program to parse the dates and times above and report their differences.

    from datetime import datetime, timedelta

    def str_to_datetime(string):
        return datetime.strptime(string, "%Y-%m-%d %H:%M:%S")

    def diff(str1, str2):
        return str_to_datetime(str1) - str_to_datetime(str2)

    for (mission, touchdown, liftoff) in data:
        print(f"{mission} {diff(liftoff, touchdown)}")

This works, but the formatting is unsatisfying.

    Apollo 11 21:36:21
    Apollo 12 1 day, 7:31:11
    Apollo 14 1 day, 9:30:29
    Apollo 15 2 days, 18:54:52
    Apollo 16 2 days, 23:02:12
    Apollo 17 3 days, 2:59:39

It would be easier to scan the output if it were all in hours. So we rewrite our diff function as follows.

    def diff(str1, str2):
        delta = str_to_datetime(str1) - str_to_datetime(str2)
        hours = delta.total_seconds() / 3600
        return round(hours, 2)

Now the output is easier to read.

    Apollo 11 21.61
    Apollo 12 31.52
    Apollo 14 33.51
    Apollo 15 66.91
    Apollo 16 71.04
    Apollo 17 74.99

These durations fall into three clusters, corresponding to the Apollo mission types G, H, and J. Apollo 11 was the only G-type mission. Apollo 12, 13, and 14 were H-type, intended to demonstrate a precise landing and explore the lunar surface. (Apollo 13 had to loop around the moon without landing.) The J-type missions were more extensive scientific missions. These missions included a lunar rover (“moon buggy”) to let the astronauts travel further from the landing site. There were no I-type missions; the objectives of the original I-type missions were merged into the J-type missions.

Incidentally, UNIX systems store times as seconds since 1970-01-01 00:00:00. That means the first two lunar landings were at negative times and the last four were at positive times. More on UNIX time here.

Related posts