Solving hard problems

We help companies solve hard problems in mathematics, statistics, and computing. Let’s explore how we might work together.

Randomly selecting points inside a triangle

Posted on 11 September 2025 by John

If you have a triangle with vertices A, B, and C, how would you generate random points inside the triangle ABC?

Barycentric coordinates

One idea would be to use barycentric coordinates.

Generate random numbers α, β, and γ from the interval [0, 1].
Normalize the points to have sum 1 by dividing each by their sum.
Return αA + βB + γC.

This generates points inside the triangle, but not uniformly.

Accept-reject

Another idea is to use an accept-reject method. Draw a rectangle around the triangle, generate points in the square, and throw them away if they land outside the triangle.

An advantage to this method is that it obviously works because it doesn’t rely on anything subtle. Three cheers for brute force!

The method is fairly efficient; on average only half the points will be rejected.

A disadvantage to all accept-reject methods is that they have variable runtime, though this only matters in some applications.

Accept-flip

There is a clever variation on the accept-reject method. Create a parallelogram by attaching a flipped copy of the triangle. Now randomly sample from the parallelogram. Every sample point will either land inside the original triangle or in its flipped twin. If it landed in the original triangle, keep it. If it landed in the twin, flip it back over and use that point.

This is like an accept-reject method, except there’s no waste. Every point is kept, possibly after flipping.

You can find code for implementing this method on Stack Overflow.

Barycentric coordinates fixed

Felix left a note in the comments that the barycentric approach would work if you draw the random weights from an exponential distribution rather than a uniform distribution. That goes against my intuition. I thought about using a different distribution on the weights, but I didn’t work out what it would need to be, but I did not expect it to be exponential. Apparently it works. Here’s a proof by plot:

A mental random number generator

Posted on 10 September 2025 by John

Man thinking of random numbers

George Marsaglia was a big name in random number generation. I’ve referred to his work multiple times here, most recently in this article from March on randomly generating points on a sphere. He is best remembered for his DIEHARD battery of tests for RNG quality. See, for example, this post.

I recently learned about a mental RNG that Marsaglia developed. It’s not great, but it’s pretty good for something that’s easy to mentally compute. The output is probably better than if you simply tried to make up random numbers; people are notoriously bad at doing that. Hillel Wayne explores Marsaglia’s method in detail here.

Here’s a summary the generator in Python.

state = 42 # Set the seed to any two digit number

def random_digit():
    tens = lambda n: n // 10
    ones = lambda n: n % 10
    global state
    state = tens(state) + 6*ones(state)
    return ones(state)

New symbols in Unicode 17

Posted on 10 September 2025 by John

Unicode 17.0 was released yesterday. According to the announcement

This version adds 4,803 new characters, including four new scripts, eight new emoji characters, as well as many other characters and symbols, bringing the total of encoded characters to 159,801.

My primary interest in Unicode is for symbols. Here are some of the new symbols I found interesting.

Currency

There’s a new symbol for the Saudi Riyal (SAR): U+20C1. This may be the most important symbol added in version 17. The rest are relatively arcane.

The riyal symbol looks like the Chinese symbol 北 (U+5317) which means “north.” However, according to the Saudi Central Bank the SAR symbol is based on a calligraphic form of the Arabic word riyal (ريال).

Note that the SAR symbol has parallel horizontal-ish lines, as many currency symbols do.

New math symbol

There was only one math-like symbol introduced in Unicode 17, and that is U+2B96, EQUALS SIGN WITH INFINITY ABOVE. I’ve never seen this symbol used anywhere, nor could I find any standard uses in a search.

$\stackrel{\infty}{=}$

I made the image above with \stackrel{\infty}{=} in LaTeX. If you flip the image upside down you get a symbol that was already in Unicode, U+2BF9, which is used in chess notation. I don’t know whether the new symbol is also used in chess notation.

I suppose the new symbol could be used to denote that two things are asymptotically equal. This would be more suggestive than the currently standard notation of using a tilde.

Asteroid symbols

Several asteroids already had Unicode symbols: Ceres, Pallas, Juno, Vesta, and Chiron correspond to U+26B3 through U+26B7. Also, Proserpina, Astraea, Hygiea, Pholus, and Nessus have Unicode symbols U+2BD8 through U+2BDC.

In Unicode 17.0, a new list of asteroids have symbols: Hebe, Iris, Flora, Metis, Parthenope, Victoria, Egeria, Irene, Eunomia, Psyche, Thetis, Melpomene, Fortuna, Proserpina, Bellona, Amphitrite, Leukothea. These correspond to U+1CEC0 through U+1CED0.

Four of these asteroids have alternate symbols add in the new version of Unicode: Vesta, Astraea, Hygiea, Parthenope have alternate forms in U+1F777 through U+1F77A. These alternate forms are listed under alchemical symbols.

Incidentally, not only a Pallas and Vesta the names of asteroids, they’re also the names of elliptic curves I wrote about recently.

Mandelbrot and Fat Tails

Posted on 9 September 2025 by John

The Mandelbrot set is the set of complex numbers c such that iterations of

f(z) = z² + c

remain bounded. But how do you know an iteration will remain bounded? You know when it becomes unbounded—if |z| > 2 then the point isn’t coming back—but how do you know whether an iteration will never become unbounded? You don’t.

So in practice, software to draw Mandelbrot sets will iterate some maximum number of times, say 5000 times, and count a point as part of the set if it hasn’t diverged yet. And for the purposes of creating pretty graphics, that’s good enough. If you want to compute the area of the Mandelbrot set, that’s not good enough.

A reasonable thing to do is to look at the distribution of escape times. Then maybe you can say that if an iteration hasn’t escaped after N steps, there’s a probability less than ε that it will escape in the future, and make ε small enough to satisfy the accuracy of your calculation.

The distribution of escape times drops off really quickly at first, as demonstrated here. Unfortunately, after this initial rapid drop off, it settles into a slow decline typical of a fat-tailed distribution. This is not atypical: a fat-tailed distribution might initially drop off faster than a thin-tailed distribution, such as the comparison between a Cauchy and a normal distribution.

The plot above groups escape times into buckets of 1,000. It starts after the initial rapid decline and shows the fat-tailed region. This was based on 100,000,000 randomly generated values of c.

To estimate the probability of an iteration escaping after having remained bounded for N steps, you need to know the probability mass in the entire tail. So, dear reader, what is the sum of the areas in the bars not shown above and not calculated? It’s not like a normal-ish distribution when you can say with reasonable confidence that the mass after a certain point is negligible.

This matters because the maximum number of iterations is in the inner loop of any program to plot the Mandelbrot set or estimate its area. If 5,000 isn’t enough, then use 50,000 as the maximum, making the program run 10x longer. But what if 50,000 isn’t enough? OK, make it 100,000, doubling the runtime again, but is that enough?

Maybe there has been work on fitting a distribution to escape times, which would allow you to estimate the amount of probability mass in the tail. But I’m just naively hacking away at this for fun, not aware of literature in the area. I’ve looked for relevant papers, but haven’t found anything.

Bech32 encoding

Posted on 9 September 2025 by John

Bech32 is an algorithm for encoding binary data, specifically Bitcoin addresses, in a human-friendly way using a 32-character alphabet. The Bech32 alphabet includes lowercase letters and digits, removing the digit 1, and the letters b, i, and o.

The Bech32 alphabet design is similar to that for other coding schemes in that it seeks to remove letters that are visually similar. However, the order of the alphabet is unusual:

    qpzry9x8gf2tvdw0s3jn54khce6mua7l

I’ll explain the motivation for the ordering shortly. Here’s the same alphabet in conventional order, not the order used by Bech32:

    023456789acdefghjklmnpqrstuvwxyz

The Bech32 algorithm does more than represent 5-bit words as alphanumeric characters. The full algorithm is documented here. The algorithm is applied to the output of a RIPEMD-160 hash, and so the number of bits is a multiple of 5.

The Bech32 algorithm includes a BCH (Bose–Chaudhuri–Hocquenghem) checksum, and takes its name from this checksum.

Note that the BCH checksum is not a cryptographic hash; it’s an error correction code. It’s purpose is to catch mistakes, not to thwart malice. Contrast this with Base58check that uses part of a SHA256 hash. Base58check can detect errors, but it cannot correct errors.

I’ve had a surprisingly hard time finding the specifics of the BCH(n, d) code that Bech32 uses. Every source I’ve found either does not give values of n and d or gives different values. In any case, the order of the Bech32 alphabet was chosen so that mistakes humans are more like to make are more likely mistakes that BCH can detect and correct.

Inferring sample size from confidence interval

Posted on 8 September 2025 by John

The previous post reported that a study found a 95% confidence interval for the the area of the Mandelbrot set to be 1.506484 ± 0.000004. What was the sample size that was used to come to that conclusion?

A 95% confidence interval for a proportion is given by

$\hat{p} \pm 1.96 \sqrt{\frac{\hat{p} \,(1 - \hat{p})}{n}}$

and so if a confidence interval of width w is centered at the proportion estimate p hat, then we can solve

$\frac{w}{2} = 1.96 \sqrt{\frac{\hat{p} (1 - \hat{p})}{n}}$

to find

$n = 15.37 \frac{\hat{p}(1 - \hat{p})}{w^2}$

Now in the example at the top of the post, we’re randomly sampling points from the square [−2, 2] × [−2, 2] and counting how many land inside the Mandelbrot set. The square has area 16, so p hat equals 1.506484 / 16. The width of the confidence interval for the area is 8 × 10⁻⁶. This means the width of the confidence interval for the proportion is 5 × 10⁻⁷, and so n = 5.244 × 10¹².

Doubts regarding Mandelbrot area

The reasoning above is correct for inferring sample size from a confidence interval, but the specific application to the Mandelbrot set is suspect. Did someone really do over a trillion iterations? Apparently not.

As far as I can tell, this was the source of the estimate above. It’s not based on random samples but rather on regular grids of samples, treated as if they were random samples, and then extrapolated to get the estimate above. The result may be correct, but it’s not a simple Monte Carlo estimate. I haven’t looked at the page carefully, but I suspect the reported confidence interval is too small; error bars tend to flare out under extrapolation.

Mandelbrot area and escape times

Posted on 8 September 2025 by John

The two latest posts have been about the Mandelbrot set, the set of complex numbers c such that iterations of

f(z) = z² + c

remain bounded. It’s easy to see that the sequence of iterates will go off to infinity if at any step |z| > 2.

For each c, we can look at the escape time, the number of iterations it takes before an value has magnitude greater than 2. The Mandelbrot set is the set of points with infinite escape time.

In the plot below, I sampled 1,000,000 points at random and colored each according to the corresponding escape time. The lighter blue color corresponds to earlier escape.

The white points in the interior have an escape time of more than 10 (and possibly infinity).

When plotting the Mandelbrot set, how do you know whether a point will never escape? You don’t, not with certainty. But by looking at the distribution of escape times you can see that most points that are going to escape do so quickly. For example, after sampling 1,000,000 points, doing 2000 iterations for each point, 99.9% of points that escape do so within 248 iterations.

If we were to plot a histogram of the escape times it would simply look like a spike on the left end. Using a logarithmic scale for the vertical axis lets us see more of what’s going on. Note that the escape times decrease rapidly, even on a log scale.

By counting what portion of points do not escape, we can calculate a Monte Carlo estimate of the area of the Mandelbrot set, and in fact this is the best way to calculate the area. There is an exact expression for the area in terms of an infinite sum, but the sum converges so slowly that it is unusable; Monte Carlo sampling is faster.

Using 1,000,000 random values of c, a total of 905,444 points from the square [−2, 2] × [−2, 2] escape within 2,000 iterations. Since the square has area 16, this gives us an estimate

A = 16 × (1000000 − 905,444)/1000000 = 1.5129.

A 95% confidence interval for the area is

A ± 16 × 1.96 × (p (1 − p)/1000000) = (1.5082, 1.5176)

where p = 905444/1000000.

A study [1] found the area to be 1.506484 ± 0.000004, which is just outside of our confidence interval. Some of the difference is due to chance, and some of it is due to our stopping at 2000 iterations. If we had gone done more iterations, more points would escape (though not very many, for reasons explained above).

Update: I redid my experiment with 10,000,000 samples and 32,000 iterations and got an area estimate of 1.507954 with a 95% confidence interval (1.505056, 1.510851), which would include the estimate cited above. However, for reasons I give here I suspect the confidence interval in [1] is not as small as reported.

[1] See OEIS A098403

Mandelbrot points of every period

Posted on 5 September 2025 by John

As mentioned in the previous post, most of the area in the Mandelbrot set comes from two regions. The largest is the blue cardioid region below and the next largest is the orange disk.

The blue cardioid is the set of points c such that iterations of z² + c converge to a fixed point. The orange disk is the set of points that converge to a cycle of period 2.

We can watch the convergence by picking a point and drawing a line between the iterates with the following Python function.

import matplotlib.pyplot as plt

def plot_iterates(c):
    f = lambda z: z**2 + c
    z = c
    s = [z]
    for i in range(1, 20):
        z = f(z)
        s.append(z)
        plt.plot([s[i-1].real, s[i].real], [s[i-1].imag, s[i].imag], color="C0")
    plt.gca().set_aspect("equal")
    plt.show()

First, we start with a point in the cardioid that converges to a fixed point. We could say it converges to a cycle of period 1.

plot_iterates(0.05 + 0.3j)

Next, lets start at a point in the orange disk with period 2.

plot_iterates(-1 + 0.2j)

We can find points with any period n. Here’s one of period 3.

plot_iterates(-0.13 + 0.66j)

And here’s one of period 4.

plot_iterates(0.256 + 0.55j)

The values of c that converge to a cycle with period q are “bulbs” tangent to the cardioid. There are φ(q) bulbs where φ is Euler’s totient function, i.e. φ(q) is the number of positive integers less than and relatively prime to q. The orange disk is the first bulb, the only one of its kind because φ(2) = 1. There are two bulbs with period 3 because φ(3) = 2.

For every p relatively prime to q there is a bulb tangent to the cardioid at w(1 − w) where

w = ½ exp(2πip/q)

and the diameter of the bulb is approximately 1/q.

These bulbs don’t account for everything in the Mandelbrot set. There’s a lot more to explore.

By the way, here is a plot of the Mandelbrot set created by selecting values of c at random and testing whether iterations of z² + c remain bounded.

Minimalist Mandelbrot set

Posted on 4 September 2025 by John

The Mandelbrot set is one of the most famous fractals. It consists of the complex numbers c such that iterations of

f(z) = z² + c

are bounded. The plot of the Mandelbrot set is a complicated image—it’s a fractal, after all—and yet there’s a simple description of an first approximation to the Mandelbrot set.

As shown in [1], the image of the disk

{α : |α| < ½}

under the map taking z to z − z² gives the set of all points where iterations of f converge to a point. This is the blue cardioid region. Call it A.

Also show in [1] is that the points

B = {c : |1 + c| < ¼}

are the ones such that f(f(z)) converges to a fixed point.

These two parts form the core of the Mandelbrot set. The blue heart-shaped region on the right is A and the orange disk on the left is B.

The rest of the Mandelbrot set are the points where iterations of f remain bounded but have more complicated behavior than the points in A or B.

[1] Alan F. Beardon. Iteration of Rational Functions. Springer-Verlag, 1991.

Measuring cryptographic strength in liters of boiling water

Posted on 2 September 2025 by John

I was listening to a podcast with Bill Buchanan recently in which he demonstrated the difficulty of various cryptographic tasks by the amount of energy they would use and how much water that would boil. Some tasks would require enough energy to boil a teaspoon of water, some a swimming pool, and some all the world’s oceans.

This is a fantastic way to compare the difficulty of various operations. There’s an old saying “you can’t boil the ocean,” and so it’s intuitively clear that encryption that you would need to boil an ocean to break is secure for all practical purposes [1]. Also, using energy rather than time removes the question of how much work is being done in parallel.

Buchanan credits Lenstra et al [2] with the idea of using units of boiling water.

The new approach was inspired by a remark made by the third author during his presentation of the factorization of the 768-bit RSA challenge at Crypto 2010: We estimate that the energy required for the factorization would have sufficed to bring two 20° C Olympic size swimming pools to a boil. This amount of energy was estimated as half a million kWh.

In the paper’s terminology, 745-bit RSA encryption and 65-bit symmetric key encryption both have “pool security” because the energy required to break them would boil an Olympic pool.

Security is typically measured in terms of symmetric encryption, so 65-bit security is “pool security.” Similarly, 114-bit security is “global security,” meaning that breaking it would require an amount of energy that could boil all the water on planet Earth, about 1.4 billion cubic kilometers of water.

World energy production is around 30,000 TWh per year, so one year of energy production could break 91-bit symmetric encryption or boil the water in Lake Geneva.

Because the difficulty in breaking symmetric encryption is an exponential function of the key length n, we can reverse engineer the formula the paper used to convert key lengths to water volumes, i.e. n bits of security requires the energy to boil

6.777 × 10⁻¹⁴ 2ⁿ

liters of water.

[1] If all the assumptions that go into your risk model are correct: the software is implemented correctly, there are no unforeseen algorithmic improvements, keys were generated randomly, etc.

[2] Arjen Lenstra, Thorsten Kleinjung, and Emannuel Thomé. Universal security: from bits to mips to pools, lakes, and beyond.

Solving hard problems

Randomly selecting points inside a triangle

Barycentric coordinates

Accept-reject

Accept-flip

Barycentric coordinates fixed

Related posts

A mental random number generator

Related posts

New symbols in Unicode 17

Currency

New math symbol

Asteroid symbols

Mandelbrot and Fat Tails

Bech32 encoding

Related posts

Inferring sample size from confidence interval

Doubts regarding Mandelbrot area

Mandelbrot area and escape times

Mandelbrot points of every period

Minimalist Mandelbrot set

Measuring cryptographic strength in liters of boiling water