Mandelbrot and Fat Tails

The Mandelbrot set is the set of complex numbers c such that iterations of

f(z) = z² + c

remain bounded. But how do you know an iteration will remain bounded? You know when it becomes unbounded—if |z| > 2 then the point isn’t coming back—but how do you know whether an iteration will never become unbounded? You don’t.

So in practice, software to draw Mandelbrot sets will iterate some maximum number of times, say 5000 times, and count a point as part of the set if it hasn’t diverged yet. And for the purposes of creating pretty graphics, that’s good enough. If you want to compute the area of the Mandelbrot set, that’s not good enough.

A reasonable thing to do is to look at the distribution of escape times. Then maybe you can say that if an iteration hasn’t escaped after N steps, there’s a probability less than ε that it will escape in the future, and make ε small enough to satisfy the accuracy of your calculation.

The distribution of escape times drops off really quickly at first, as demonstrated here. Unfortunately, after this initial rapid drop off, it settles into a slow decline typical of a fat-tailed distribution. This is not atypical: a fat-tailed distribution might initially drop off faster than a thin-tailed distribution, such as the comparison between a Cauchy and a normal distribution.

The plot above groups escape times into buckets of 1,000. It starts after the initial rapid decline and shows the fat-tailed region. This was based on 100,000,000 randomly generated values of c.

To estimate the probability of an iteration escaping after having remained bounded for N steps, you need to know the probability mass in the entire tail. So, dear reader, what is the sum of the areas in the bars not shown above and not calculated? It’s not like a normal-ish distribution when you can say with reasonable confidence that the mass after a certain point is negligible.

This matters because the maximum number of iterations is in the inner loop of any program to plot the Mandelbrot set or estimate its area. If 5,000 isn’t enough, then use 50,000 as the maximum, making the program run 10x longer. But what if 50,000 isn’t enough? OK, make it 100,000, doubling the runtime again, but is that enough?

Maybe there has been work on fitting a distribution to escape times, which would allow you to estimate the amount of probability mass in the tail. But I’m just naively hacking away at this for fun, not aware of literature in the area. I’ve looked for relevant papers, but haven’t found anything.

Bech32 encoding

Bech32 is an algorithm for encoding binary data, specifically Bitcoin addresses, in a human-friendly way using a 32-character alphabet. The Bech32 alphabet includes lowercase letters and digits, removing the digit 1, and the letters b, i, and o.

The Bech32 alphabet design is similar to that for other coding schemes in that it seeks to remove letters that are visually similar. However, the order of the alphabet is unusual:

    qpzry9x8gf2tvdw0s3jn54khce6mua7l

I’ll explain the motivation for the ordering shortly. Here’s the same alphabet in conventional order, not the order used by Bech32:

    023456789acdefghjklmnpqrstuvwxyz

The Bech32 algorithm does more than represent 5-bit words as alphanumeric characters. The full algorithm is documented here. The algorithm is applied to the output of a RIPEMD-160 hash, and so the number of bits is a multiple of 5.

The Bech32 algorithm includes a BCH (Bose–Chaudhuri–Hocquenghem) checksum, and takes its name from this checksum.

Note that the BCH checksum is not a cryptographic hash; it’s an error correction code. It’s purpose is to catch mistakes, not to thwart malice. Contrast this with Base58check that uses part of a SHA256 hash. Base58check can detect errors, but it cannot correct errors.

I’ve had a surprisingly hard time finding the specifics of the BCH(nd) code that Bech32 uses. Every source I’ve found either does not give values of n and d or gives different values. In any case, the order of the Bech32 alphabet was chosen so that mistakes humans are more like to make are more likely mistakes that BCH can detect and correct.

Related posts

Inferring sample size from confidence interval

The previous post reported that a study found a 95% confidence interval for the the area of the Mandelbrot set to be 1.506484 ± 0.000004. What was the sample size that was used to come to that conclusion?

A 95% confidence interval for a proportion is given by

\hat{p} \pm 1.96 \sqrt{\frac{\hat{p} \,(1 - \hat{p})}{n}}

and so if a confidence interval of width w is centered at the proportion estimate p hat, then we can solve

\frac{w}{2} = 1.96 \sqrt{\frac{\hat{p} (1 - \hat{p})}{n}}

to find

n = 15.37 \frac{\hat{p}(1 - \hat{p})}{w^2}

Now in the example at the top of the post, we’re randomly sampling points from the square [−2, 2] × [−2, 2] and counting how many land inside the Mandelbrot set. The square has area 16, so p hat equals 1.506484 / 16. The width of the confidence interval for the area is 8 × 10−6. This means the width of the confidence interval for the proportion is 5 × 10−7, and so n = 5.244 × 1012.

Doubts regarding Mandelbrot area

The reasoning above is correct for inferring sample size from a confidence interval, but the specific application to the Mandelbrot set is suspect. Did someone really do over a trillion iterations? Apparently not.

As far as I can tell, this was the source of the estimate above. It’s not based on random samples but rather on regular grids of samples, treated as if they were random samples, and then extrapolated to get the estimate above. The result may be correct, but it’s not a simple Monte Carlo estimate. I haven’t looked at the page carefully, but I suspect the reported confidence interval is too small; error bars tend to flare out under extrapolation.

Mandelbrot area and escape times

The two latest posts have been about the Mandelbrot set, the set of complex numbers c such that iterations of

f(z) = z² + c

remain bounded. It’s easy to see that the sequence of iterates will go off to infinity if at any step |z| > 2.

For each c, we can look at the escape time, the number of iterations it takes before an value has magnitude greater than 2. The Mandelbrot set is the set of points with infinite escape time.

In the plot below, I sampled 1,000,000 points at random and colored each according to the corresponding escape time. The lighter blue color corresponds to earlier escape.

The white points in the interior have an escape time of more than 10 (and possibly infinity).

When plotting the Mandelbrot set, how do you know whether a point will never escape? You don’t, not with certainty. But by looking at the distribution of escape times you can see that most points that are going to escape do so quickly. For example, after sampling 1,000,000 points, doing 2000 iterations for each point, 99.9% of points that escape do so within 248 iterations.

If we were to plot a histogram of the escape times it would simply look like a spike on the left end. Using a logarithmic scale for the vertical axis lets us see more of what’s going on. Note that the escape times decrease rapidly, even on a log scale.

By counting what portion of points do not escape, we can calculate a Monte Carlo estimate of the area of the Mandelbrot set, and in fact this is the best way to calculate the area. There is an exact expression for the area in terms of an infinite sum, but the sum converges so slowly that it is unusable; Monte Carlo sampling is faster.

Using 1,000,000 random values of c, a total of 905,444 points from the square [−2, 2] × [−2, 2] escape within 2,000 iterations. Since the square has area 16, this gives us an estimate

A = 16 × (1000000 − 905,444)/1000000 = 1.5129.

A 95% confidence interval for the area is

A ± 16 × 1.96 × (p (1 − p)/1000000) = (1.5082, 1.5176)

where p = 905444/1000000.

A study [1] found the area to be 1.506484 ± 0.000004, which is just outside of our confidence interval. Some of the difference is due to chance, and some of it is due to our stopping at 2000 iterations. If we had gone done more iterations, more points would escape (though not very many, for reasons explained above).

Update: I redid my experiment with 10,000,000 samples and 32,000 iterations and got an area estimate of 1.507954 with a 95% confidence interval (1.505056, 1.510851), which would include the estimate cited above. However, for reasons I give here I suspect the confidence interval in [1] is not as small as reported.

[1] See OEIS A098403

Mandelbrot points of every period

As mentioned in the previous post, most of the area in the Mandelbrot set comes from two regions. The largest is the blue cardioid region below and the next largest is the orange disk.

The blue cardioid is the set of points c such that iterations of z² + c converge to a fixed point. The orange disk is the set of points that converge to a cycle of period 2.

We can watch the convergence by picking a point and drawing a line between the iterates with the following Python function.

import matplotlib.pyplot as plt

def plot_iterates(c):
    f = lambda z: z**2 + c
    z = c
    s = [z]
    for i in range(1, 20):
        z = f(z)
        s.append(z)
        plt.plot([s[i-1].real, s[i].real], [s[i-1].imag, s[i].imag], color="C0")
    plt.gca().set_aspect("equal")
    plt.show()

First, we start with a point in the cardioid that converges to a fixed point. We could say it converges to a cycle of period 1.

plot_iterates(0.05 + 0.3j)

Next, lets start at a point in the orange disk with period 2.

plot_iterates(-1 + 0.2j)

We can find points with any period n. Here’s one of period 3.

plot_iterates(-0.13 + 0.66j)

And here’s one of period 4.

plot_iterates(0.256 + 0.55j)


The values of c that converge to a cycle with period q are “bulbs” tangent to the cardioid. There are φ(q) bulbs where φ is Euler’s totient function, i.e. φ(q) is the number of positive integers less than and relatively prime to q. The orange disk is the first bulb, the only one of its kind because φ(2) = 1. There are two bulbs with period 3 because φ(3) = 2.

For every p relatively prime to q there is a bulb tangent to the cardioid at w(1 − w) where

w = ½ exp(2πip/q)

and the diameter of the bulb is approximately 1/q.

These bulbs don’t account for everything in the Mandelbrot set. There’s a lot more to explore.

By the way, here is a plot of the Mandelbrot set created by selecting values of c at random and testing whether iterations of z² + c remain bounded.

Minimalist Mandelbrot set

The Mandelbrot set is one of the most famous fractals. It consists of the complex numbers c such that iterations of

f(z) = z² + c

are bounded. The plot of the Mandelbrot set is a complicated image—it’s a fractal, after all—and yet there’s a simple description of an first approximation to the Mandelbrot set.

As shown in [1], the image of the disk

A = {α : |α| < ½}

under the map taking z to zz² gives the set of all points where iterations of f converge to a point.

Also show in [1] is that the points

B = {c : |1 + c| < ¼}

are the ones such that f(f(z)) converges to a fixed point.

These two parts form the core of the Mandelbrot set. The blue heart-shaped region on the right is A and the orange disk on the left is B.

The rest of the Mandelbrot set are the points where iterations of f remain bounded but have more complicated behavior than the points in A or B.

[1] Alan F. Beardon. Iteration of Rational Functions. Springer-Verlag, 1991.

Measuring cryptographic strength in liters of boiling water

I was listening to a podcast with Bill Buchanan recently in which he demonstrated the difficulty of various cryptographic tasks by the amount of energy they would use and how much water that would boil. Some tasks would require enough energy to boil a teaspoon of water, some a swimming pool, and some all the world’s oceans.

This is a fantastic way to compare the difficulty of various operations. There’s an old saying “you can’t boil the ocean,” and so it’s intuitively clear that encryption that you would need to boil an ocean to break is secure for all practical purposes [1]. Also, using energy rather than time removes the question of how much work is being done in parallel.

Buchanan credits Lenstra et al [2] with the idea of using units of boiling water.

The new approach was inspired by a remark made by the third author during his presentation of the factorization of the 768-bit RSA challenge at Crypto 2010: We estimate that the energy required for the factorization would have sufficed to bring two 20° C Olympic size swimming pools to a boil. This amount of energy was estimated as half a million kWh.

In the paper’s terminology, 745-bit RSA encryption and 65-bit symmetric key encryption both have “pool security” because the energy required to break them would boil an Olympic pool.

Security is typically measured in terms of symmetric encryption, so 65-bit security is “pool security.” Similarly, 114-bit security is “global security,” meaning that breaking it would require an amount of energy that could boil all the water on planet Earth, about 1.4 billion cubic kilometers of water.

World energy production is around 30,000 TWh per year, so one year of energy production could break 91-bit symmetric encryption or boil the water in Lake Geneva.

Because the difficulty in breaking symmetric encryption is an exponential function of the key length n, we can reverse engineer the formula the paper used to convert key lengths to water volumes, i.e. n bits of security requires the energy to boil

6.777 × 10−14 2n

liters of water.

[1] If all the assumptions that go into your risk model are correct: the software is implemented correctly, there are no unforeseen algorithmic improvements, keys were generated randomly, etc.

[2] Arjen Lenstra, Thorsten Kleinjung, and Emannuel Thomé. Universal security: from bits to mips to pools, lakes, and beyond.

 

Impossible rational triangles

A rational triangle is a triangle whose sides have rational length and whose area is rational. Can any two rational numbers be sizes of a rational triangle? Surprisingly no. You can always find a third side of rational length, but it might not be possible to do so while keeping the area rational.

The following theorem comes from [1].

Let q be a prime number not congruent to 1 or −1 mod 8. And let ω be an odd integer such that no prime factor of q − 1 is congruent to 1 mod 4. Then a = qω + 1 and b = qω − 1 are not the sides of any rational triangle.

To make the theorem explicitly clear, here’s a Python implementation.

from sympy import isprime, primefactors

def test(q, omega):
    if not isprime(q):
        return False
    if q % 8 == 1 or q % 8 == 7:
        return False
    if omega % 2 == 0:
        return False
    factors = primefactors(q**(2*omega) - 1)
    for f in factors:
        if f % 4 == 1:
            return False
    return True

We can see that q = 2 and ω = 1 passes the test, so there is no rational triangle with sides a = 21 + 1 = 3 and b = 21 − 1 = 1.

You could use the code above to search for (ab) pairs systematically.

from sympy import primerange

for q in primerange(20):
    for omega in range(1, 20, 2):
        if test(q, omega):
            a, b = q**omega + 1, q**omega - 1
            print(a, b)

This outputs the following:

3 1
9 7
33 31
129 127
8193 8191
32769 32767
131073 131071
524289 524287
4 2
6 4
126 124
14 12

Note that one of the outputs is (4, 2). Since multiplying the sides of a rational triangle by a rational number gives another rational triangle, there cannot be a rational triangle in which one side is twice as long as another.

As the author in [1] points out, “There are undoubtedly many pairs not covered by [the theorem cited above]. It would be of interest to characterize all pairs.” The paper was published in 1976. Maybe there has been additional work since then.

Related posts

[1] N. J. Fine. On Rational Triangles. The American Mathematical Monthly. Vol. 83. No. 7, pp. 517–521.

Trigamma

Guy wearing a ΓΓΓ shirt

The most important mathematical function after the basics is the gamma function. If I could add one function to a calculator that has trig functions, log, and exponential, it would be the gamma function. Or maybe the log of the gamma function; it’s often more useful than the gamma function itself because it doesn’t overflow as easily.

The derivative of the log gamma function is the digamma function, denoted ψ. It comes up often in application. I just did a quick search and found I’ve written six posts containing the word “digamma.”

\psi(z) = \frac{d}{dz} \log \Gamma(z)

The derivative of the digamma function ψ′ is the trigamma function.

\psi^\prime(z) = \frac{d}{dz} \psi(z) = \frac{d^2}{dz^2} \log \Gamma(z)

The trigamma function, and higher derivatives of the digamma function, appear in applications. I remember, for example, a researcher asking me to add the trigamma function to the mathematical library I wrote for the biostatistics department at MD Anderson.

I was thinking about the trigamma function because I ran across a the following series for the function [1].

\psi^\prime (z+1) = \frac{1}{(z + 1)} + \frac{1!}{2} \frac{1}{(z+1)^{\bar{2}}} + \frac{2!}{3}\frac{1}{(z+1)^{\bar{3}}} + \cdots

Note the bars on top of the exponents: the denominators are rising powers of z + 1, not ordinary powers.

The series converges uniformly for Re(z) > −1 + δ for δ > 0 [2]. It series converges quickly for large z.

When I saw the title of the paper I thought it sounded like a Greek fraternity. There is a Tri-Delta fraternity, but as far as I know there is no Tri-Gamma fraternity.

Related posts

[1] Harold Ruben. A Note on the Trigamma Function. The American Mathematical Monthly. Vol 83, No. 8. p. 622.

[2] It may seem unnecessary to say Re(z) > −1 + δ for δ > 0. Couldn’t you just say for Re(z) > −1? Pointwise, yes, but uniform convergence requires the real part of z to be bounded away from −1 by a fixed amount, regardless of the imaginary part of z.

Vanity addresses

Bitcoin addresses are essentially hash values of public keys encoded in Base58. More details here.

The addresses are essentially random characters chosen from the Base58 alphabet: uppercase and lowercase Latin letters and digits, with 0 (zero), I (capital I), O (capital O), and l (lowercase l) removed to prevent errors.

You could create an address that begins with chosen characters by generating private/public key pairs and hashing the latter until you get the initial string you want [1]. People have done this. The vanitygen software gives the example of searching for an address that starts with “Love” (after the mandatory initial 1).

$ ./vanitygen 1Love
...                         
Pattern: 1Love
Address: 1LoveRg5t2NCDLUZh6Q8ixv74M5YGVxXaN
Privkey: 5JLUmjZiirgziDmWmNprPsNx8DYwfecUNk1FQXmDPaoKB36fX1o

Search time

You can’t specify a long prefix. The difficulty in finding an address with n specified initial characters grows exponentially, like 58n.

Say it takes a minute on average to find an address with the desired first four characters. Each additional character makes the search take 58 times longer, so specifying 5 characters would take about an hour. If you wanted to find an address that begins 1AcmeCorp it would take 584 minutes or over 21 years.

Disadvantages

There are several reasons you might not want to use a vanity address.

First, it’s now customary to generate new addresses for each transaction. This gives Bitcoin some degree of privacy, though not much. Presumably if you go to the effort of creating a custom address you’d like to hold on to it. Maybe you’d like to use it as an address for someone to send you donations, for example, and you’re OK reusing the address.

Second, if you use software someone else wrote to generate the custom address, the site may hold onto your private key or send it somewhere.

Third, you can’t use the address with an HD wallet. Related to the first point, HD wallets generate a sequence of key pairs, and thus addresses, according to some deterministic algorithm, and so it can’t hold an address it didn’t generate. But there are wallets that will hold such addresses.

Proof of work

Incidentally, the proof-of-work task for Bitcoin is very similar to finding vanity addresses. You take a block and twiddle what you can until it hashes to a value with a target number of leading zero bits. More on that here.