John D. Cook

Legendre polynomials

John — Tue, 08 Jul 2025 00:59:41 +0000

The previous post mentioned Legendre polynomials. This post will give a brief introduction to these polynomials and a couple hints of how they are used in applications.

One way to define the Legendre polynomials is as follows.

P₀(x) = 1
P_k are orthogonal on [−1, 1].
P_k(1) = 1 for all k ≥ 0.

The middle bullet point means

if m ≠ n. The requirement that each P_k is orthogonal to each of its predecessors determines P_k up to a constant, and the condition P_k(1) = 1 determines this constant.

Here’s a plot of the first few Legendre polynomials.

There’s an interesting pattern that appears in the white space of a graph like the one above when you plot a large number of Legendre polynomials. See this post.

The Legendre polynomial P_k(x) satisfies Legendre’s differential equation; that’s what motivated them.

This differential equation comes up in the context of spherical harmonics.

Next I’ll describe a geometric motivation for the Legendre polynomials. Suppose you have a triangle with one side of unit length and two longer sides of length r and y.

You can find y in terms of r by using the law of cosines:

But suppose you want to find 1/y in terms of a series in 1/r. (This may seem like an arbitrary problem, but it comes up in applications.) Then the Legendre polynomials give you the coefficients of the series.

Source: Keith Oldham et al. An Atlas of Functions. 2nd edition.

The post Legendre polynomials first appeared on John D. Cook.

The biggest perturbation of satellite orbits

John — Mon, 07 Jul 2025 11:36:05 +0000

To first approximation, a satellite orbiting the earth moves in an elliptical orbit. That’s what would get from solving the two-body problem: two point masses orbiting their common center of mass, subject to no forces other than their gravitational attraction to each other.

But the earth is not a point mass. Neither is a satellite, though that’s much less important. The fact that the earth is not exactly a sphere but rather an oblate spheroid is the root cause of the J2 effect.

The J2 effect is the largest perturbation of a satellite orbit from a simple elliptical orbit, at least for satellites in low earth orbit (LEO) and medium earth orbit (MEO). The J2 effect is significant for satellites in higher orbits, though third body effects are larger.

Legendre showed that the gravitational potential of an axially symmetric planet is given by

Here (r, φ) are spherical coordinates. There’s no θ term because we assume the planet, and hence its gravitational potential, is axially symmetric, i.e. independent of θ. The term r_eq is the equatorial radius of the planet. The P_k are Legendre polynomials.

For a moderately oblate planet, like the one we live on, the J₂ coefficient is much larger than the others, and neglecting the rest of the coefficients gives a good approximation [1].

Here are the first few coefficients for Earth [2].

Note that J₂ is three orders of magnitude smaller than 1, and so the J2 effect is small. And yet it matters a great deal. The longitude of the point at which a satellite crosses the equatorial plane may vary a few degrees per day. The rate of precession is approximately proportional to J₂.

The value of J₂ for Mars is about twice that of Earth (0.001960454). The largest J₂ in our solar system is Neptune, which is about three times that of Earth (0.003411).

There are many factors left out of the assumptions of the two body problem. The J2 effect doesn’t account for everything that has been left out, but it’s the first refinement.

Transmission Obstacles and Ellipsoids

John — Sat, 05 Jul 2025 18:33:08 +0000

Suppose you have a radio transmitter T and a receiver R with a clear line of sight between them. Some portion the signal received at R will come straight from T. But some portion will have bounced off some obstacle, such as the ground.

The reflected radio waves will take a longer path than the waves that traveled straight from T to R. The worst case for reception is when the waves traveling a longer path arrive half a period later, i.e. 180° out of phase, canceling out part of the signal that was received directly.

We’d like to describe the region of space that needs to be empty in order to eliminate destructive interference, i.e. signals 180° out of phase. Suppose T and R are a distance d apart and the wavelength of your signal is λ. An obstacle at a location P can cause signals to arrive exactly out of phase if the distance from T to P plus the distance from P to R is d + λ/2.

So we’re looking for the set of all points such that the sum of their distances to fixes points is a constant. This is the nails-and-string description of an ellipse, where the nails are a distance d apart and the string has length d + λ/2.

That would be a description of the region if we were limited to a plane, such as a plane perpendicular to the ground and containing the transmitter and receiver. But signals could reflect off an obstacle that’s outside this plane. So now we need to imagine being able to move the string in three dimensions. We still get all the points we’d get if we were restricted to a plane, but we also get their rotation about the axis running between T and R.

The region we’re describing is an ellipsoid, known as a Fresnel ellipsoid or Fresnel zone.

Suppose we choose our coordinates so that our transmitter T is located at (0, 0, h) and our receiver R is located at (d, 0, h). We imagine a string of length d + λ/2 with endpoints attached to T and R. We stretch the string so it consists of two straight segments. The set of all possible corners in the string traces out the Fresnel ellipsoid.

Greater delays

If reflected waves are delayed by exactly one period, they reinforce the portion of the signal that arrived directly. Signals delayed by an even multiple of a half-period cause constructive interference, but signals delayed by odd multiples of a half-period cause destructive interference. The odd multiples matter most because we’re more often looking to avoid destructive interference rather than seeking out opportunities for constructive interference.

If you repeat the exercise above with a string of length length d + λ you have another Fresnel ellipsoid. The foci remain the same, i.e. T and R, but this new ellipsoid is bigger since the string is longer. This ellipsoid represents locations where a signal reflected at that point will arrive one period later than a signal traveling straight. Obstacles on the surface of this ellipsoid cause constructive interference.

We can repeat this exercise for a string of length d + nλ/2, where odd values of n correspond to regions of destructive interference. This gives us a set of confocal ellipsoids known as the Fresnel ellipsoids.

The post Transmission Obstacles and Ellipsoids first appeared on John D. Cook.

Zooming in on a fractalish plot

John — Sun, 29 Jun 2025 11:33:57 +0000

The exponential sum of the day page on my site draws an image every day by plugging the month, day, and year into a formula. Details here.

Today’s image looks almost solid blue in the middle.

The default plotting line width works well for most days. For example, see what the sum of the day will look line on July 1 this year. Making the line width six times thinner reveals more of the detail in the middle.

You can see even more detail in a PDF of the plot.

The post Zooming in on a fractalish plot first appeared on John D. Cook.

Most ints are not floats

John — Fri, 27 Jun 2025 13:26:41 +0000

All integers are real numbers, but most computer representations of integers do not equal computer representations of real numbers.

To make the statement above precise, we have to be more specific about what we mean by computer integers and floating point numbers. I’ll use int32 and int64 to refer to 32-bit and 64-bit signed integers. I’ll use float32 and float64 to refer to IEEE 754 single precision and double precision numbers, what C calls float and double.

Most int32 numbers cannot be represented exactly by a float32. All int32 numbers can be represented approximately as float32 numbers, but usually not exactly. The previous statements remain true if you replace “32” everywhere by “64.”

32 bit details

The int32 data type represents integers −2³¹ through 2³¹ − 1. The float32 data type represents numbers of the form

± 1.f × 2^e

where 1 bit represents the sign, 23 bits represent f, and 8 bits represent e.

The number n = 2²⁴ can be represented by setting the fractional part f to 0 and setting the exponent e to 24. But the number n + 1 cannot be represented as a float32 because its last bit would fall off the end of f:

2²⁴ + 1 = (1 + 2⁻²⁴) 2²⁴ = 1.000000000000000000000001_two × 2²⁴

The bits in f fill up representing the 23 zeros after the decimal place. There’s no 24th bit to store the final 1.

For each value of e, there are 2²³ possible values of f. So for each of e = 24, 25, 26, …, 31 there are 2²³ representable integers, for a total of 8 × 2²³.

So of the 2³¹ non-negative integers that can be represented by an int32 data type, only 9 × 2²³ can also be represented exactly as a float32 data type. This means about 3.5% of 32-bit integers can be represented exactly by a 32-bit float.

64 bit details

The calculations for 64-bit integers and 64-bit floating point numbers are analogous. Numbers represented by float64 data types have the form

± 1.f × 2^e

where now f has 52 bits and e has 11.

Of the 2⁶³ non-negative integers that can be represented by an int64 data type, only 11 × 2⁵² can also be represented exactly as a float64 data type. This means about 0.5% of 64-bit integers can be represented exactly by a 64-bit float.

A note on Python

Python’s integers have unlimited range, while its floating point numbers correspond to float64. So there are two reasons an integer might not be representable as a float: it may be larger than the largest float, or it may require more than 53 bits of precision.

The post Most ints are not floats first appeared on John D. Cook.

Test whether a large integer is a square

John — Fri, 27 Jun 2025 00:58:26 +0000

Years ago I wrote about a fast way to test whether an integer n is a square. The algorithm rules out a few numbers that cannot be squares based on their last (hexadecimal) digit. If the the integer passes through this initial screening, the algorithm takes the square root of n as a floating point number, rounds to an integer, and tests whether the square of the rest equals n.

What’s wrong with the old algorithm

The algorithm described above works fine if n is not too large. However, it implicitly assumes that the integer can be represented exactly as a floating point number. This is two assumptions: (1) it’s not too large to represent, and (2) the representation is exact.

A standard 64-bit floating point number has 1 bit for the sign, 11 bits for the exponent, and 52 bits for the significand. (More on that here.) Numbers will overflow if they run out of exponent bits, and they’ll lose precision if they run out of significand bits. So, for example, 2¹⁰²⁴ will overflow [1]. And although 2⁵³ + 1 will not overflow, it cannot be represented exactly.

These are illustrated in Python below.

    >>> float(2**1024)
    OverflowError: int too large to convert to float
    >>> n = 2**53
    >>> float(n) == float(n + 1)
    True

A better algorithm

The following algorithm [2] uses only integer operations, and so it will work exactly for arbitrarily large integers in Python. It’s a discrete analog of Newton’s square root algorithm.

    def intsqrt(N):
        n = N
        while n**2 > N:
            n = (n + N//n) // 2
        return n

    def issquare(N):
        n = intsqrt(N)
        return n**2 == N

The function issquare will test whether N is a square. The function intsqrt is broken out as a separate function because it is independently useful since it returns ⌊√N⌋.

The code above correctly handles examples that the original algorithm could not.

    >>> issquare(2**1024)
    True
    >>> issquare(2**54)
    True
    >>> issquare(2**54 + 1)
    False

You could speed up issquare by quickly returning False for numbers that cannot be squared on account of their final digits (in some base—maybe you’d like to use a base larger than 16) as in the original post.

[1] The largest allowed exponent is 1023 for reasons explained here.

[2] Bob Johnson. A Route to Roots. The Mathematical Gazette, Vol. 74, No. 469 (Oct., 1990), pp. 284–285

The post Test whether a large integer is a square first appeared on John D. Cook.

Legendre and Ethereum

John — Thu, 26 Jun 2025 13:38:09 +0000

What does an eighteenth century French mathematician have to do with the Ethereum cryptocurrency?

A pseudorandom number generator based on Legendre symbols, known as Legendre PRF, has been proposed as part of a zero knowledge proof mechanism to demonstrate proof of custody in Ethereum 2.0. I’ll explain what each of these terms mean and include some Python code.

The equation x² = a mod p is solvable for some values of a and not for others. The Legendre symbol

is defined to be 1 if a has a square root mod p, and −1 if it does not. Here a is a positive integer and p is a (large) prime [1]. Note that this is not a fraction, though it looks like one.

As a varies, the Legendre symbols vary randomly. Not literally randomly, of course, but the act random enough to be useful as a pseudorandom number generator.

Legendre PRF

Generating bits by computing Legendre symbols is a lot more work than generating bits using a typical PRNG, so what makes the Legendre PRF of interest to Ethereum?

Legendre symbols can be computed fairly efficiently. You wouldn’t want to use the Legendre PRF to generate billions of random numbers for some numerical integration computation, but for a zero knowledge proof you only need to generate a few dozen bits.

To prove that you know a key k, you can generate a string of pseudorandom bits that depend on the key. If you can do this for a few dozen bits, it’s far more likely that you know the key than that you have guessed the bits correctly. Given k, the Legendre PRF computes

for n consecutive values of x [2].

One reason this function is of interest is that it naturally lends itself to multiparty computation (MPC). The various parties could divide up the range of x values that each will compute.

The Legendre PRF has not been accepted to be part of Ethereum 2.o. It’s only been proposed, and there is active research into whether it is secure enough.

Python code

Here’s a little Python scrypt to demonstrate using L_k(x).

    from sympy import legendre_symbol

    def L(x, k, p):
        return (1 - legendre_symbol(x + k, p)) // 2

    k = 20250626
    p = 2**521 - 1
    print([L(x, k, p) for x in range(10)])

This produces the following.

    [1, 1, 1, 1, 0, 0, 1, 0, 0, 1]

[1] There is a third possibility: the Legendre symbol equals 0 if a is a multiple of p. We can safely ignore this possibility for this post.

[2] The Legendre symbol takes on the values ±1 and so we subtract this from 1 to get values {0, 2}, then divide by 2 to get bits {0, 1}.

The post Legendre and Ethereum first appeared on John D. Cook.

Patching functions together

John — Wed, 25 Jun 2025 15:44:58 +0000

The previous post looked at a function formed by patching together the function

f(x) = log(1 + x)

for positive x and

f(x) = x

for negative x. The functions have a mediocre fit, which may be adequate for some applications, such as plotting data points, but not for others, such as visual design. Here’s a plot.

There’s something not quite smooth about the plot. It’s continuous, even differentiable, at 0, but still there’s sort of a kink at 0.

The way to quantify smoothness is to count how many times you can differentiate a function and have a continuous result. We would say the function above is C¹ but not C² because its first derivative is continuous but not differentiable. If two functions have the same number of derivatives, you can distinguish their smoothness by looking at the size of the derivatives. Or if you really want to get sophisticated, you can use Sobolev norms.

As a rule of thumb, your vision can detect a discontinuity in the second derivative, maybe the third derivative in some particularly sensitive contexts. For physical applications, you also want continuous second derivatives (acceleration). For example, you want a roller coaster track to be more than just continuous and once differentiable.

Natural cubic splines are often used in applications because they patch cubic polynomials together so that the result is C².

You can patch together smooth functions somewhat arbitrarily, but not analytic functions. If you wanted to extend the function log(1 + x) analytically, your only choice is log(1 + x). More on that here.

If you patch together a flat line and a circle, as in the corners of a rounded rectangle, the curve is continuous but not C². The second derivative jumps from 0 to 1/r where r is the radius of the circle. A squircle is similar to a rounded rectangle but has smoothly varying curvature.

The post Patching functions together first appeared on John D. Cook.

Log-ish

John — Tue, 24 Jun 2025 14:38:13 +0000

I saw a post online this morning that recommended the transformation

0 \\ x & \mbox{if } x \leq 0 \end{array} \right." width="240" height="48" />

I could see how this could be very handy. Often you want something like a logarithmic scale, not for the exact properties of the logarithm but because it brings big numbers closer in. And for big values of x there’s little difference between log(x) and log(1 + x).

The function above is linear near the origin, literally linear for negative values and approximately linear for small positive values.

I’ve occasionally needed something like a log scale, but one that would handle values that dip below zero. This transformation would be good for that. If data were equally far above and below zero, I’d use something like arctangent instead.

Notice that the function has a noticeable bend around 0. The next post addresses this.

The post Log-ish first appeared on John D. Cook.

Uniformity increases entropy

John — Mon, 23 Jun 2025 20:30:42 +0000

Suppose you have a system with n possible states. The entropy of the system is maximized when all states are equally likely to occur. The entropy is minimized when one outcome is certain to occur.

You can say more. Starting from any set of probabilities, as you move in the direction of more uniformity, you increase entropy. And as you move in the direction of less uniformity, you decrease entropy.

These statements can be quantified and stated more precisely. That’s what the rest of this post will do.

***

Let p_i be the probability of the ith state and let p be the vector of the p_i.

Then the entropy of p is defined as

If one of the p‘s is 1 and the rest of the p‘s are zero, then H(p) = 0. (In the definition of entropy, 0 log₂ 0 is taken to be 0. You could justify this as the limit of x log₂ x as x goes to zero.)

If each of the p_i are equal, p_i = 1/n, then H(p) = log₂ n. The fact that this is the maximum entropy, and that compromises between the two extremes always decrease entropy, comes from the fact that the entropy function H is concave (proof). That is, if p₁ is one list of probabilities and p₂ another, then

When we speak informally of moving from p₁ in the direction of p₂, we mean we increase the parameter λ from 0 to some positive amount no more than 1.

Because entropy is concave, there are no local maxima. As you approach the location of global maximum entropy, i.e. equal state probabilities, from any direction, entropy increases monotonically.

The post Uniformity increases entropy first appeared on John D. Cook.

Sinc function approximation

John — Mon, 23 Jun 2025 12:38:55 +0000

The sinc function

sinc(x) = sin(x) / x

comes up continually in signal processing. If x is moderately small, the approximation

sinc(x) ≈ (2 + cos(x))/3

is remarkably good, with an error on the order of x⁴/180. This could be useful in situations where you’re working with the sinc function and the x in the denominator is awkward to deal with and you’d rather have a pure trig function.

Here’s a plot:

Of course the approximation is only good for small x. For large x the sinc function approaches zero while (2 + cos(x))/3 oscillates with constant amplitude forever.

When the approximation is good, it is very, very good, which reminds me of this nursery rhyme.

There was a little girl,
Who had a little curl,
Right in the middle of her forehead.
When she was good,
She was very, very good,
But when she was bad, she was horrid.

Arithmetic for fun and profit

John — Mon, 23 Jun 2025 00:09:55 +0000

Four years ago I wrote a blog post about simple solutions to client problems. The post opens by recounting a conversation with a friend that ended with my friend saying “So, basically you’re recommending division.”

That conversation came back to me recently when I had a similar conversation during a deposition. I had to explain that the highly sophisticated method I used to arrive at a number in my report was to take the ratio of two other numbers.

I don’t always do division for clients. Sometimes I do multiplication, as when a lawyer asked me to compute the momentum of a car. There I had to multiply mass and velocity. (I actually turned down that project because I had a sense there was a lot more to the problem than I was being told.)

Of course real-world problems do not always have trivial solutions. Sometimes getting to the solution is complicated even though the solution itself is simple. And sometimes the solution is complicated because that’s just how things are. You might need something complicated, like a cepstrum analysis, or you might just need arithmetic.

The post Arithmetic for fun and profit first appeared on John D. Cook.

Why use hash puzzles for proof-of-work?

John — Sun, 22 Jun 2025 17:31:01 +0000

A couple days ago I wrote about the the problem that Bitcoin requires to be solved as proof-of-work. In a nutshell, you need to tweak a block of transactions until the SHA256 double hash of its header is below a target value [1]. Not all cryptocurrencies use proof of work, but those that do mostly use hash-based puzzles.

Other cryptocurrencies use a different hashing problem, but they still use hashes. Litecoin and Dogecoin use the same proof-of-work problem, similar to the one Bitcoin uses, but with the scrypt (pronounced S-crypt) hash function. Several cryptocurrencies use a hashing problem based on Equihash. Monero uses its RandomX algorithm for proof-of-work, and although this algorithm has multiple components, it ultimately solves a hashing problem. [2]

Why hash puzzles?

Why do cryptocurrencies use hashing problems for proof of work? In principle they could use any computational problem that is hard to solve but easy to verify, such as numerous problems in computational number theory.

One reason is that computer scientists are confident that quantum computing would not reduce the difficulty of solving hash puzzles, even though it would reduce the difficulty of factoring-based puzzles. Also, there is general agreement that it’s unlikely a mathematical breakthrough will find a weakness in hashing functions.

Ian Cassels said “Cryptography is a mixture of mathematics and muddle, and without the muddle the mathematics can be used against you.” Hashing is much more muddle than mathematics.

Why not do something useful?

Hash puzzles work well for demonstrating work done, but they’re otherwise useless. They keep the wheels of cryptocurrencies turning, but the solutions themselves are intrinsically worthless.

Wouldn’t it be nice if crypto miners were solving useful problems like protein folding? You could do that. In fact there is a cryptocurrency FoldingCoin that does just that. But FoldingCoin has a market cap seven orders of magnitude smaller than Bitcoin, on the order of $200,000 compared to Bitcoin’s market cap of $2T.

Cryptocurrencies that use proof of useful work have not taken off. This might necessarily be the case. Requiring practical work creates divergent incentives. If you base a currency on the difficulty of protein folding computations, for example, it would cause major disruption if a pharmaceutical company decided to get into mining protein folding cryptocurrency at a loss because it values the results.

Going back to Cassels’ remark about mathematics and muddle, practical real-world problems often have a mathematical structure. Which is a huge blessing, except when you’re designing problems to be hard. Hash-based problems have gradually become easier to solve over time, and cryptocurrencies have adjusted. But a mathematical breakthrough for solving a practical problem would have the potential to disrupt a currency faster than the market could adapt.

[1] You don’t change the transaction amounts, but you may change the order in which the transactions are arranged into a Merkle tree so that you get different hash values. You can also change a 32-bit nonce, and a timestamp, but most of the degrees of freedom you need in order to find an acceptable hash comes from rearranging the tree.

[2] Both scrypt and Equihash were designed to be memory intensive and to thwart the advantage custom ASIC mining hardware. However, people have found a way to use ASIC hardware to solve scrypt and Equihash problems. RandomX requires running a randomly generated problem before hashing the output in an attempt to frustrate efforts to develop specialized mining hardware.

The post Why use hash puzzles for proof-of-work? first appeared on John D. Cook.

Minimize squared relative error

John — Sat, 21 Jun 2025 13:16:55 +0000

Suppose you have a list of positive data points y₁, y₂, …, y_n and you wanted to find a value α that minimizes the squared distances to each of the y‘s.

Then the solution is to take α to be the mean of the y‘s:

This result is well known [1]. The following variation is not well known.

Suppose now that you want to choose α to minimize the squared relative distances to each of the y‘s. That is, you want to minimize the following.

The value of alpha this expression is the contraharmonic mean of the y‘s [2].

[1] Aristotle says in the Nichomachean Ethics “The mean is in a sense an extreme.” This is literally true: the mean minimizes the sum of the squared errors.

[2] E. F. Beckenbach. A Class of Mean Value Functions. The American Mathematical Monthly. Vol. 57, No. 1 (Jan., 1950), pp. 1–6

The post Minimize squared relative error first appeared on John D. Cook.

What is the Bitcoin proof-of-work problem?

John — Fri, 20 Jun 2025 13:26:02 +0000

In order to prevent fraud, anyone wanting to add a block to the Bitcoin blockchain must prove that they’ve put in a certain amount of computational work. This post will focus on what problem must be solved in order produce proof of work.

You’ll see the proof of work function described as finding strings whose SHA256 hash value begins with a specified number of 0s. That’s sort of a zeroth-level approximation of the problem.

The string s you’re trying to find has the form data + nonce where the data comes from the block you’re wanting to add and the nonce is a value you concatenate on the end. You try different values until you get an acceptable hash value.

You’re not computing the SHA256(s) but rather the double hash:

SHA256²(s) = SHA256( (SHA256(s) )

The only way to find such a string s is by brute force [1], and applying the hash function twice doubles the amount of brute force work needed.

And you’re not exactly trying to produce leading zeros; you’re trying to produce a value less than a target T. This is roughly the same thing, but not quite.

To illustrate this, suppose you have a 2FA fob that generates six-digit random numbers. You’ve been asked to wait until your fob generates a number less than 2025. Waiting until you have three leading zeros would be sufficient, but that would be making the task harder than it needs to be. You’d be waiting for a number less than 1000 when you’re only asked to wait for a number less than 2025.

A SHA256 hash value is a 256-bit number. If your target T is a power of 2

T = 2²⁵⁶⁻ⁿ

then finding a value of s such that

SHA256²(s) < T

really is finding an s whose (double) hash begins with n zeros, though T is not required to be a power of 2.

Finding a value of s with

SHA256²(s) < 2²⁵⁶⁻ⁿ

will require, on average, testing 2ⁿ values of s.

The value of T is adjusted over time in order to keep the amount of necessary work roughly constant. As miners have become more efficient, the value of T has gone down and the amount of work has gone up. But the value of T can go up as well. It is currently fluctuating around 2¹⁷⁶, i.e. hashes must have around 80 leading zero bits.

Now here’s where things get a little more complicated. I said at the beginning that the string s has the form

s = data + nonce

where the data comes from the block you’re trying to add and the nonce is a number you twiddle in order to get the desired hash value. But the nonce is a 32-bit integer. If you need to hash on the order of 2⁸⁰ strings in order to find one with 80 leading zeros, you can’t do that just by adjusting a 32-bit nonce.

In practice you’re going to have to twiddle the contents of what I’ve called data. The data contains a Merkle tree of transactions, and you can change the hash values by adjusting the order in which transactions are arranged in the tree, in addition to adjusting the nonce.

[1] Unless someone finds a flaw in SHA256, which cryptographers have been trying to do for years and have not been able to do. And if a significant weakness is found in SHA256, it may not translate into a significant flaw in SHA256².

The post What is the Bitcoin proof-of-work problem? first appeared on John D. Cook.

Deleting vs Replacing Names

John — Thu, 19 Jun 2025 14:37:32 +0000

This post looks at whether you should delete names or replace names when deidentifying personal data. With structured data, generating synthetic names does not increase or decrease privacy. But with unstructured data, replacing real names with randomly generated names increases privacy protection.

Structured data

If you want to deidentify structured data (i.e. data separated into columns in a database) then an obvious first step is to remove a column containing names. This is not sufficient—the reason deidentification is subtle is that it is possible to identify people after the obvious identifiers have been removed—but clearly removing names is necessary.

Instead of removing the names, you could replace them with randomly generated names. This neither hurts nor helps privacy. But this might be useful, say, when creating test data.

Say you’re developing software that handles patient data. You want to use real patient data because you want the test data to be realistic, but that may not be acceptable (or legal) due to privacy risks. You can’t just remove names because you need to test whether software displays names correctly, so you replace real names with randomly generated names. No harm, no foul.

Unstructured data

Now let’s switch contexts and look at unstructured data, such as text transcriptions of doctors’ notes. You still need to remove names, but it’s not as simple as removing the name column. Instead of the database structure telling you what’s a name, what’s a diagnosis, etc. you have to infer these things statistically [1], which means there will always be errors [2].

When working with unstructured text, replacing names with randomly generated names is better for privacy. If you could identify names with 100% accuracy, then it would make no difference whether you removed or replaced names, but you can’t. Software will always miss some names [3].

Suppose your software replaces suspected names with “NAME REDACTED.” When your software fails to identify a name, it’s obvious that it failed. For example,

NAME REDACTED and his wife Margaret came into the clinic today to be tested for …

But if instead your software replaced suspected names with synthetic names, it is not obvious when the software fails. When you see a reference to Margaret, you can’t know whether the patient’s name was Virginia and was replaced with Margaret or whether the software made an error skipped over Margaret’s name.

All else being equal, it’s better to synthesize names than remove names. But that doesn’t mean that just any process for synthesizing names will be adequate. The error rate doesn’t need to be zero, but it can’t be too high either. And the process should not be biased. If the software consistently left Hispanic names slip through, for example, Hispanic patients would not appreciate that.

[1] “But what if I use a large language model?” That’s a particular kind of statistical inference.

[2] Any statistical test, such as testing whether a string of text represents a name, will have false positives (type I error) and false negatives (type II error). Software to remove names will remove some text that isn’t a name, and fail to recognize some text as names.

[3] We’ve tested a lot of deidentification software packages for clients, and the error rate is never zero. But privacy regulations such as HIPAA don’t require the error rate to be zero, only sufficiently small.

The post Deleting vs Replacing Names first appeared on John D. Cook.

Typesetting Sha and Bitcoin

John — Wed, 18 Jun 2025 11:45:35 +0000

I went down a rabbit hole this week using two symbols in LaTeX. The first was the Russian letter Sha (Ш, U+0248), and the second was the currency symbol for Bitcoin (₿, U+20BF).

Sha

I thought there would be a LaTeX package that would include Ш as a symbol rather than as a Russian letter, just as \pi produces π as a symbol rather than as a Greek letter per se, but apparently there isn’t. I was surprised, since Ш is used in math for at least three different things [1].

When I post on @TeXtip how to produce various symbols in LaTeX, I often get a reply telling me I should simply paste in the Unicode character and use XeTeX. That’s what I ended up doing, except one does not simply use XeTeX.

You have to set the font to one that contains a glyph for the character you want, and you have to use a font encoding package. I ended up adding these two lines to my file header:

    \usepackage[T2A]{fontenc}
    \usepackage{eulervm}

That worked, but only when I compiled with pdflatex, not xelatex.

Bitcoin

I ended up using a different but analogous tactic for the Bitcoin symbol. I used fontspec, Liberation Sans, and xelatex rather than fontenc, Euler, and pdflatex. These were the lines I added to the header:

    \usepackage{fontspec}
    \setmainfont{Liberation Sans}

Without these two lines I get the error message

    Missing character: There is no ₿ (U+20BF) in font ...

I didn’t need to use ₿ and Ш in the same document, but the approach in this section works for both. The approach in the previous section will not work for ₿ because the Euler font does not contain a gylph for ₿.

[1] The three mathematical uses of Ш that I’m aware of are the shuffle product, the Dirac comb distribution, and Tate–Shafarevich group.

The post Typesetting Sha and Bitcoin first appeared on John D. Cook.

Golden powers revisited

John — Tue, 10 Jun 2025 14:15:53 +0000

Years ago I wrote a post Golden powers are nearly integers. The post was picked up by Hacker News and got a lot of traffic. The post was commenting on a line from Terry Tao:

The powers φ, φ², φ³, … of the golden ratio lie unexpectedly close to integers: for instance, φ¹¹ = 199.005… is unusually close to 199.

In the process of writing my recent post on base-φ numbers I came across some equations that explain exactly why golden powers are nearly integers.

Let φ be the golden ratio and ψ = −1/φ. That is, φ and ψ are the larger and smaller roots of

x² − x − 1 = 0.

Then powers of φ reduce to an integer and an integer multiple of φ. This is true for negative powers of φ as well, and so powers of ψ also reduce to an integer and an integer multiple of ψ. And in fact, the integers alluded to are Fibonacci numbers.

φⁿ = F_n φ + F_{n − 1}
ψⁿ = F_n ψ + F_{n − 1}

These equations can be found in TAOCP 1.2.8 exercise 11.

Adding the two equations leads to [1]

φⁿ = F_{n + 1} + F_{n − 1} − ψⁿ

So yes, φⁿ is nearly an integer. In fact, it’s nearly the sum of the (n + 1)st and (n − 1)st Fibonacci numbers. The error in this approximation is −ψⁿ, and so the error decreases exponentially with alternating signs.

[1] φⁿ + ψⁿ = F_n (φ + ψ) + 2 F_{n − 1} = F_n + 2 F_{n − 1} = F_n + F_{n − 1} + F_{n − 1}= F_{n + 1} + F_{n − 1}

The post Golden powers revisited first appeared on John D. Cook.

Naively computing sine

John — Mon, 09 Jun 2025 13:46:15 +0000

Suppose you need to write software to compute the sine function. You were told in a calculus class that this is done using Taylor series—it’s not, but that’s another story—and so you start writing code to implement Taylor series.

How many terms do you need? Uh, …, 20? Let’s go with that.

from math import sin, factorial

def sine(x):
    s = 0
    N = 20
    for n in range(N//2):
        m = 2*n + 1
        s += (-1)**n * x**m / factorial(m)
    return s

You test your code against the built-in sin function and things look good.

    >>> sin(1)
    0.8414709848078965
    >>> sine(1)
    0.8414709848078965

Larger arguments

Next, just for grins, you try x = 100.

    >>> sin(100)
    -0.5063656411097588
    >>> sine(100)
    -7.94697857233433e+20

Hmm. Off by 20 orders of magnitude.

So you do a little pencil and paper calculation and determine that the code should work if you set N = 300 in the code above. You try again, and now you’re off by 25 orders of magnitude. And when you try entering 100.0 rather than 100, the program crashes.

What’s going on? In theory, setting N = 300 should work, according to the remainder term in the Taylor series for sine. But the largest representable floating point number is on the order of 10³⁰⁸ and so x**m terms in the code overflow. The reason the program didn’t crash when you typed in 100 with no decimal point is that Python executed x**m in integer arithmetic.

If you were patient and willing to use a huge number of terms in your Taylor series, you couldn’t include enough terms in your series to get accurate results for even moderately large numbers because overflow would kill you before you could get the series truncation error down. Unless you used extended precision floating point. But this would be using heroic measures to rescue a doomed approach.

Taylor series are good local approximations. They’re only valid within a radius of convergence; outside of that radius they’re worthless even in theory. Inside the radius they may be valid in theory but not in practice, as is the case above.

Range reduction

Taylor series isn’t the best approach for evaluating sine numerically, but you could make it work if you limited yourself to a small range of angles, say 0 to π/4, and used trig identities to reduce the general problem to computing sine in the range [0, π/4]. So how do you do that?

Let’s look at reducing arguments to live in the interval [0, 2π]. In practice you’d want to reduce the range further, but that requires keeping up with what quadrant you’re in etc. We’ll just consider [0, 2π] for simplicity.

The natural thing to do is subtract off the largest multiple of 2π you can.

    >>> from math import pi
    >>> k = int(100/(2*pi))
    >>> x = 100 - k*2*pi
    >>> sine(x)
    -0.5065317636696883
    >>> sin(100)
    -0.5063656411097588

That’s better. Now we don’t overflow, and we get three correct decimal places. We could increase N and get more. However, there’s a limit to how well we could do, as the calculation below shows.

    >>> sin(x)
    -0.5063656411097495
    >>> sin(100)
    -0.5063656411097588

When we reduce 100 mod 2π and call sin(x), we get a result that differs from sin(100) in the last three decimal places. That’s not terrible, depending on the application, but the loss of precision increases with the size of x and would be a complete loss for large enough values of x.

Let’s look back at where we divided 100 by 2π:

    >>> 100/(2*pi)
    15.915494309189533

When we throw away the integer part (15) and keep the rest (.915494309189533) we lose two figures of precision. The larger the integer part, the more precision we lose.

Our naive attempt at range reduction doesn’t work so well, but there are clever ways to make range reduction work. It is possible, for example, to compute the sine of a googol (i.e. sin 10¹⁰⁰).

At first it seems simple to reduce the range. Then when you become aware of the precision problems and think about it a while, it seems precise range reduction is impossible. But if you keep thinking about it long enough you might find a way to make it work. A lot of effort went into range reduction algorithms years ago, and now they’re part of the computational infrastructure that we take for granted.

The post Naively computing sine first appeared on John D. Cook.

Computing the Euler-Mascheroni Constant

John — Sun, 08 Jun 2025 20:07:29 +0000

The Euler-Mascheroni constant is defined as the limit

So an obvious way to try to calculate γ would be to evaluate the right-hand side above for large n. This turns out to not be a very good approach. Convergence is slow and rounding error accumulates.

A much better approach is to compute

It’s not obvious that you can add the extra factor of ½ in the log term, but you can: the right-hand sides of both equations above converge to γ, but the latter converges faster.

Here’s a comparison of the two methods.

The error in the first method with n = 4000 is comparable to the error in the second method with n = 16.

Even though the second equation above is better for numerical evaluation, there are much more sophisticated approaches. And this brings us to y-cruncher, software that has been used to set numerous world records for computing various constants. A new record for computing the digits of π was set a few weeks ago using y-cruncher. From the y-cruncher documentation:

Of all the things that y-cruncher supports, the Euler-Mascheroni Constant is the slowest to compute and by far the most difficult to implement. …

The Euler-Mascheroni Constant has a special place in y-cruncher. It is one of the first constants to be supported (even before Pi) and it is among the first for which a world record was set using y-cruncher. As such, y-cruncher derives its name from the gamma symbol for the Euler-Mascheroni Constant.

Posts involving γ

The post Computing the Euler-Mascheroni Constant first appeared on John D. Cook.

Golden ratio base numbers

John — Sun, 08 Jun 2025 16:31:11 +0000

It is possible to express every positive integer as a sum of powers of the golden ratio φ using each power at most once. This means it is possible to create a binary-like number system using φ as the base with coefficients of 0 and 1 in front of each power of φ.

This system is sometimes called phinary because of the analogy with binary. I’ll use that term here rather than more formal names such as base-φ or golden base number system.

An interesting feature of phinary is that in general you need to include negative powers of φ to represent positive integers. For example,

and so you could write 2 in this system as 10.01.

To state things more formally, every positive integer n satisfies the following equation where a finite number of coefficients coefficients a_k are equal to 1 and the rest are equal to 0.

The golden ratio satisfies φ² = φ + 1 and so phinary representations are not unique. But if you add the rule that number representations must not have consecutive 1s, then representations are unique, analogous to the Fibonacci number system.

The original paper describing the phinary system [1] is awkwardly written. It has the flavor of “Here are some examples. You can see how this generalizes.” rather than a more typical mathematical style.

The end of the article says “Jr. High School 246 Brooklyn, N.Y.” and so when I got to that point I thought the style was due to the paper having been written by a public school teacher rather than a professional mathematician. I later learned from [2] that the author was not a math teacher but a student: George Bergman was 12 years old when he discovered and published his number system.

Phinary is not as simple to develop as you might expect. Bergman’s discovery was impressive, and not only because he was 12 years old at the time. You can find more sophisticated developments in [2] and in [3], but both require a few preliminaries and are not simple.

***

[1] George Bergman. A Number System with an Irrational Base. Mathematics Magazine. 31 (2): 98–110. 1957.

[2] Cecil Rousseau. The Phi Number System Revisited. Mathematics Magazine, Vol. 68, No. 4 (Oct., 1995), pp. 283-284

[3] Donald Knuth. The Art of Computer Programming, volume 1.

The post Golden ratio base numbers first appeared on John D. Cook.

Scientific papers: innovation … or imitation?

Wayne Joubert — Thu, 05 Jun 2025 17:30:49 +0000

Sometimes a paper comes out that has the seeds of a great idea that could lead to a whole new line of pioneering research. But, instead, nothing much happens, except imitative works that do not push the core idea forward at all.

For example the McCulloch Pitts paper from 1943 showed how neural networks could represent arbitrary logical or Boolean expressions of a certain class. The paper was well-received at the time, brilliantly executed by co-authors with diverse expertise in neuroscience, logic and computing. Had its signficance been fully grasped, this paper might have, at least notionally, formed a unifying conceptual bridge between the two nascent schools of connectionism and symbolic AI (one can at least hope). But instead, the heated conflict in viewpoints in the field has persisted, even to this day.

Another example is George Miller’s 7 +/- 2 paper. This famous result showed humans are able to hold only a small number of pieces of information in mind at the same time while reasoning. This paper was important not just for the specific result, but for the breakthrough in methodology using rigorous experimental noninvasive methods to discover how human thinking works—a topic we know so little about, even today. However, the followup papers by others, for the most part, only extended or expanded on the specific finding in very minor ways. [1] Thankfully, Miller’s approach did eventually gain influence in more subtle ways.

Of course it’s natural from the incentive structures of publishing that many papers would be primarily derivative rather than original. It’s not a bad thing that, when a pioneering paper comes out, others very quickly write rejoinder papers containing evaluations or minor tweaks of the original result. Not bad, but sometimes we miss the larger implications of the original result and get lost in the details.

Another challenge is stovepiping—we get stuck in our narrow swim lanes for our specific fields and camps of research. [2] We don’t see the broader implications, such as connections and commonalities across fields that could lead to fruitful new directions.

Thankfully, at least to some extent current research in AI shows some mix of both innovation and imitation. Inspired in part by the accelerationist mindset, many new papers appear every day, some with significant new findings and others that are more modest riffs on previous papers.

Notes

[1] Following this line of research on human thought processes could be worthwhile for various reasons. For example, some papers in linguistics state that Chomsky‘s vision for a universal grammar is misguided because the common patterns in human language are entirely explainable by the processing limitations of the human mind. But this claim is made with no justification or methodological rigor of any kind. If I claimed a CPU performs vector addition or atomic operations efficiently because of “the capabilities of the processor,” I would need to provide some supporting evidence, for example, documenting that the CPU has vector processing units or specialized hardware for atomics. The assertions about language structure being shaped by the human mental processing faculty is just an empty truism, unless supported by some amount of scientific rigor and free of the common fallacies of statistical reasoning.

[2] I recently read a paper in linguistics with apparent promise, but the paper totally misconstrued the relationship between Shannon entropy and Kolmogorov complexity. Sadly this paper passed review in a linguistic journal, but if it had had a mathematically inclined reviewer, the problem would have been caught and fixed.

The post Scientific papers: innovation … or imitation? first appeared on John D. Cook.

Binomial number system

John — Thu, 05 Jun 2025 14:23:22 +0000

I just stumbled across the binomial number system in Exercise 5.38 of Concrete Mathematics. The exercise asks the reader to show that every non-negative integer n can be written as

and that the representation is unique if you require 0 ≤ a < b < c. The book calls this the binomial number system. I skimmed a paper that said this has some application in signal processing, but I haven’t looked at it closely [1].

You can find a, b, and c much as you would find the representation in many other number systems: first find the largest possible c, then the largest possible b for what’s left, and then the remainder is a.

In order to find c, we start with the observation that the binomial coefficient C(k, 3) is less than k³/6 and so c is less than the cube root of 6n. We can use this as an initial lower bound on c then search incrementally. If we wanted to be more efficient, we could do some sort of binary search.

Here’s Python code to find a, b, and c.

from math import comb, factorial

def lower(n, r):
    "Find largest k such that comb(k, r) <= n."
    k = int( (factorial(r)*n)**(1/r) ) # initial guess
    while comb(k, r) <= n: 
        k += 1 
    return k - 1 

def binomial_rep(n): 
    c = lower(n, 3) 
    cc = comb(c, 3) 
    b = lower(n - comb(c, 3), 2) 
    bb = comb(b, 2) 
    a = n - cc - bb 
    assert(c > b > a >= 0)
    return (a, b, c)

For example, here’s the binomial number system representation of today’s date.

>>> binomial_rep(20250605)
(79, 269, 496)
>>> comb(496, 3) + comb(269, 2) + comb(79, 1)
20250605

You could use any number of binomial terms, not just three.

[1] I looked back at the paper, and it is using a different kind of binomial number system, expressing numbers as sums of fixed binomial coefficients, not varying the binomial coefficient arguments. This representation has some advantages for error correction.

The post Binomial number system first appeared on John D. Cook.

Additive and multiplicative persistence

John — Wed, 04 Jun 2025 15:03:07 +0000

Casting out nines is a well-known way of finding the remainder when a number is divided by 9. You add all the digits of a number n. And if that number is bigger than 9, add all the digits of that number. You keep this up until you get a number less than 9.

This is an example of persistence. The persistence of a number, relative to some operation, is the number of times you have to apply that operation before you reach a fixed point, a point where applying the operation further makes no change.

The additive persistence of a number is the number of times you have to take the digit sum before getting a fixed point (i.e. a number less than 9). For example, the additive persistence of 8675309 is 3 because the digits in 8675309 sum to 38, the digits in 38 sum to 11, and the digits in 11 sum to 2.

The additive persistence of a number in base b is bounded above by its iterated logarithm in base b.

The smallest number with additive persistence n is sequence A006050 in OEIS.

Multiplicative persistence is analogous, except you multiply digits together. Curiously, it seems that multiplicative persistence is bounded. This is true for numbers with less than 30,000 digits, and it is conjectured to be true for all integers.

The smallest number with multiplicative persistence n is sequence A003001 in OEIS.

Here’s Python code to compute multiplicative persistence.

def digit_prod(n):
    s = str(n)
    prod = 1
    for d in s:
        prod *= int(d)
    return prod

def persistence(n):
    c = 0
    while n > 9:
        n = digit_prod(n)
        print(n)
        c += 1
    return c

You could use this to verify that 277777788888899 has persistence 11. It is conjectured that no number has larger persistence larger than 11.

The persistence of a large number is very likely 1. If you pick a number with 1000 digits at random, for example, it’s very likely that at least of these digits will be 0. So it would seem that the probability of a large number having persistence larger than 11 would be incredibly small, but I would not have expected it to be zero.

Here are plots to visualize the fixed points of the numbers less than N for increasing values of N.

It’s not surprising that zeros become more common as N increases. And a moments thought will explain why even numbers are more common than odd numbers. But it’s a little surprising that 4 is much less common than 2, 6, or 8.

Incidentally, we have worked in base 10 so far. In base 2, the maximum persistence is 1: when you multiply a bunch of 0s and 1s together, you either get 0 or 1. It is conjectured that the maximum persistence in base 3 is 3.

The post Additive and multiplicative persistence first appeared on John D. Cook.

Iterated logarithm

John — Wed, 04 Jun 2025 11:55:45 +0000

Logarithms give a way to describe large numbers compactly. But sometimes even the logarithm is a huge number, and it would be convenient to take the log again. And maybe again.

For example, consider

googolplex = 10^googol

where

googol = 10¹⁰⁰.

(The name googol is older than “Google,” and in fact the former inspired the latter name.)

A googolplex is a big number. It’s log base 10 is still a big number. You have to take the log 3 times to get to a small number:

log₁₀( log₁₀( log₁₀ googolplex ) ) = 2.

Log star

The iterated logarithm base b of a positive number x, written log*_b (x) is the number of times you have to take the log base b until you get a result less than or equal to 1. So in the example above

log*₁₀ googolplex = 4.

As usual, a missing base implies b = e. Also just as log base 2 is often denoted lg, the iterated log base 2 is sometimes denoted lg*.

I’ve seen sources that say the base of an iterated logarithm doesn’t matter. That’s not true, because, for example,

log*₂ googolplex = 6.

The function log*(x) appears occasionally in the analysis of algorithms.

Python implementation

It’s easy to implement the iterated log, though this is of limited use because the iterated log is usually applied to numbers too large to represent as floating point numbers.

from math import log, exp

def logstar(x, base=exp(1)):
    c = 0
    while x > 1:
        x = log(x, base)
        c += 1
    return c

lgstar = lambda x: logstar(x, 2)

Mersenne primes

The largest known prime, at the time of writing, is

p = 2^136279841 − 1.

We can use the code above to determine that lg* 136279841 = 5 and so lg* p = 6.

Skewes’ bounds

The prime number theorem says that the number of prime numbers less than x, denoted π(x), is asymptotically li(x) where

You may have seen a different form of the prime number theorem that says π(x) is asymptotically x/log x. That’s true too, because li(x) and x/log x are asymptotically equal. But li(x) gives a better approximation to π(x) for finite x. More on that here.

For every x for which π(x) has been computed, π(x) < li(x). However, Littlewood proved in 1914 that there exists some x for which π(x) > li(x). In 1933 Stanley Skewes gave an upper bound on the smallest x such that π(x) > li(x), assuming the Riemann Hypothesis:

S₁ = 10^(10^(10^34))

In 1955 he proved the upper bound

S₂ = 10^(10^(10^964))

not assuming the Riemann Hypothesis.

You can calculate that

log*₁₀ S₁ = 5

and

log*₁₀ S₂ = 6.

Iterated iterated logarithm

In all the examples above, log*(x) is a small number. But for some numbers, such as Moser’s number and Graham’s number, even log*(x) is unwieldy and you have to do things like take the iterated logarithm of the iterated logarithm

log**(x) = log*( log*(x) )

before the numbers become imaginable.

The post Iterated logarithm first appeared on John D. Cook.

Approximation of Inverse, Inverse of Approximation

John — Tue, 03 Jun 2025 13:47:27 +0000

“The inverse of an approximation is an approximation of the inverse.” This statement is either obvious or really clever. Maybe both.

Logs and exponents

Let’s start with an example. For moderately small x,

That means that near 1 (i.e. near 10 raised to a small number,

The inverse of a good approximation is a good approximation of the inverse. But the measure of goodness might not be the same in both uses of the word “good.” The inverse of a very good approximation might be a mediocre approximation of the inverse, depending on the sensitivity of the problem. See, for example, Wilkinson’s polynomial.

Inverse function theorem

Now let’s look at derivatives and inverse functions. The derivative of a function is the best linear approximation to that function at a point. And it’s true that the best linear approximation to the inverse of a function at a point is the inverse of the best linear approximation to the function. In symbols,

This is a good example of a bell curve meme, or coming full circle. The novice naively treats the symbols above as fractions and says “of course.” The sophomore says “Wait a minute. These aren’t fractions. You can’t just do that.” And they would be correct, except the symbols are limits of fractions, and in this context naive symbol manipulation in fact leads to the correct result.

However, the position on the right side of the meme is nuanced. Treating derivatives as fractions, especially partial derivatives, can get you into trouble.

The more rigorous statement of the inverse function theorem is harder to parse visually, and so the version above is a good mnemonic. If f(x) = y then

The same is true in multiple variables. If f is a smooth function from ℝⁿ to ℝⁿ then the best linear approximation to f at a point is a linear transformation, which can be represented by a matrix M. And the best linear approximation to the inverse function at that point is the linear transformation represented by the inverse of M. In symbols,

There’s fine print, such as saying a function has to be differentiable in a neighborhood of x and the derivative has to be non-singular, but that’s the gist of the inverse function theorem.

Note that unlike in the first section, there’s no discussion of how good the approximations are. The approximations are exact, at a point. But it might still be the case that the linear approximation on one side is good over a much larger neighborhood than the linear approximation on the other side.

The post Approximation of Inverse, Inverse of Approximation first appeared on John D. Cook.

False witnesses

John — Mon, 02 Jun 2025 18:58:43 +0000

Fermat’s primality test

Fermat’s little theorem says that if p is a prime and a is not a multiple of p, then

a^p−1 = 1 (mod p).

The contrapositive of Fermat’s little theorem says if

a^p−1 ≠ 1 (mod p)

then either p is not prime or a is a multiple of p. The contrapositive is used to test whether a number is prime. Pick a number a less than p (and so a is clearly not a multiple of p) and compute a^p−1 mod p. If the result is not congruent to 1 mod p, then p is not a prime.

So Fermat’s little theorem gives us a way to prove that a number is composite: if a^p−1 mod p equals anything other than 1, p is definitely composite. But if a^p−1 mod p does equal 1, the test is inconclusive. Such a result suggests that p is prime, but it might not be.

Examples

For example, suppose we want to test whether 57 is prime. If we apply Fermat’s primality test with a = 2 we find that 57 is not prime, as the following bit of Python shows.

    >>> pow(2, 56, 57)
    4

Now let’s test whether 91 is prime using a = 64. Then Fermat’s test is inconclusive.

    >>> pow(64, 90, 91)
    1

In fact 91 is not prime because it factors into 7 × 13.

If we had used a = 2 as our base we would have had proof that 91 is composite. In practice, Fermat’s test is applied with multiple values of a (if necessary).

Witnesses

If a^p−1 = 1 (mod p), then a is a witness to the primality of p. It’s not proof, but it’s evidence. It’s still possible p is not prime, in which case a is a false witness to the primality of p, or p is a pseudoprime to the base a. In the examples above, 64 is a false witness to the primality of 91.

Until I stumbled on a paper [1] recently, I didn’t realize that on average composite numbers have a lot of false witnesses. For large N, the average number of false witnesses for composite numbers less than N is greater than N^15/23. Although N^15/23 is big, it’s small relative to N when N is huge. And in applications, N is huge, e.g. N = 2²⁰⁴⁸ for RSA encryption.

However, your chances of picking one of these false witnesses, if you were to choose a base a at random, is small, and so probabilistic primality testing based on Fermat’s little theorem is practical, and in fact common.

Note that the result quoted above is a lower bound on the average number of false witnesses. You really want an upper bound if you want to say that you’re unlikely to run into a false witness by accident. The authors in [1] do give upper bounds, but they’re much more complicated expressions than the lower bound.

[1] Paul Erdős and Carl Pomerance. On the Number of False Witnesses for a Composite Number. Mathematics of Computation. Volume 46, Number 173 (January 1986), pp. 259–279.

The post False witnesses first appeared on John D. Cook.

Houston’s long term planning

John — Mon, 02 Jun 2025 14:47:17 +0000

When I hear American cities criticized for lack of long-term planning, my initial thought is that this is a good thing because the future cannot be predicted or directed very far in advance, and cities should be allowed to grow organically. Houston, in particular, is subject to a lot of criticism because of its lack of zoning, which I also think is a good thing. [1]

With that in mind, I was very surprised to see the following thread [2] from Aaron Renn.

Maybe the craziest thing about Houston’s third beltway, Grand Parkway, is that the idea originated in 1961, when the Houston metro area population was only 1.2 million people. [It’s now 7.8 million. — JC]

“In October 1965, plans for the Grand Parkway became public. The Houston City Planning Commission released a draft version of the 1966 Major Thoroughfare and Freeway Plan showing the new third loop on the official planning map.”

“The City Planning Commission approved the route and scheduled a public hearing for February 17, 1966. The Houston Chronicle enthusiastically endorsed the new route, saying, ‘No sensible citizen can doubt that this freeway will be needed eventually.’”

Aaron’s quotations are from Houston Freeways.

To a very crude approximation, a map of Houston looks like three concentric rings [2]. According to the sources cited above, planning for the outer ring began long before the middle ring was completed. The first section of the Grand Parkway opened in 1994, and it’s maybe three quarters finished currently.

Maybe what has made Houston successful is that it planned far ahead on a macro scale, but has not micromanaged as much as most cities.

[1] Along these lines, see James Scott’s book Seeing Like a State for examples of failures in long-range state planning. Or if you don’t have the to read the book, you might want to read Venkatash Rao’s article A Big Little Idea Called Legibility.

[2] It’s not exactly a thread. It’s a series of posts that quote the first sentence.

[3] When the outer ring, the Grand Parkway, is completed, it will be the longest beltway in the US and encompass more area than the state of Rhode Island.

The post Houston’s long term planning first appeared on John D. Cook.

Gardener’s ellipse

John — Sun, 01 Jun 2025 19:00:45 +0000

There are several ways to define an ellipse. If you want to write software draw an ellipse, it’s most convenient to have a parametric form:

This gives an ellipse centered at (h, k), with semi-major axis a, semi-minor axis b, and with the major axis rotated by an angle θ from horizontal.

But if you’re going to physically draw an ellipse, there’s a more convenient definition: an ellipse is the set of points such that the sum of the distances to two foci is a constant s. This method is most often called the gardener’s ellipse. It’s also called the nails-and-string method, which is more descriptive.

To draw an ellipse on a sheet of plywood, drive a nails in the plywood at both foci, and attach an end of a string of length s to each nail. Then put a pencil in the middle of the string and pull it tight. Keep the string tight as you move the pencil. This will draw an ellipse [1].

Presumably the name gardener’s ellipse comes from the same idea on a larger scale, such as a gardener marking out an elliptical flower bed using two stakes in the ground and some rope.

Going back to drawing on a computer screen, there is a practical use for the gardener’s ellipse: to determine whether a point is inside an ellipse, add the distance from the point to each of the foci of the ellipse. If the sum is less than s then the point is inside the ellipse. If the sum is greater than s the point is outside the ellipse.

From parameters to nails and string

If you have the parametric form of an ellipse, how do you find the gardener’s method representation?

To make it easier to describe points, set θ = 0 and h = k = 0 for a moment. Then the foci are at (±c, 0) where c² = a² − b².

Since (a, 0) is a point on the string, you have to be able to draw it and so the string must stretch from the focus at (−c, 0) to the point (a, 0) and back to the focus at (c, 0). Therefore the length of the string is 2a.

The length of the string depends on the major axis and not on the minor axis.

When we shift and rotate our ellipse, letting θ, h, and k take on possibly non-zero values, we apply the same transformation to the foci.

Rotating and translating the ellipse doesn’t change the length of the major axis, so s stays the same.

From nails and string to parameters

Draw a line between the two “nails” (foci). The slope of this line is tan θ and it’s midpoint is (h, k). The parameter c is half the length of the line.

The semimajor axis a is half the string length s.

Once you know a and c you can solve for b = √(a² − c²).

[1] In practice, a carpenter might want to draw an ellipse a different way.

The post Gardener’s ellipse first appeared on John D. Cook.

Fitting a parabola to an ellipse and vice versa

John — Sun, 01 Jun 2025 16:29:21 +0000

The previous post discussed fitting an ellipse and a parabola to the same data. Both fit well, but the ellipse fit a little better. This will often be the case because an ellipse has one more degree of freedom than a parabola.

There is one way to fit a parabola to an ellipse at an extreme point: match the end points and the curvatures. This uses up all the degrees of freedom in the parabola.

When you take the analogous approach to fitting an ellipse to a parabola, you have one degree of freedom left over. The curvature depends on a ratio, and so you can adjust the parameters while maintaining the ratio. You can use this freedom to fit the ellipse better over an interval while still matching the curvature at the vertex.

The rest of the post will illustrate the ideas outlined above.

Fitting a parabola to an ellipse

Suppose you have an ellipse with equation

(x/a)² + (y/b)² = 1.

The curvature at (±a, 0) equals a/b² and the curvature at (0, ±b) equals b/a².

Now if you have a parabola

x = cy² + d

then its curvature at y = 0 is 2|c|.

If you want to match the parabola and the ellipse at (a, 0) then d = a.

To match the curvatures at (a, 0) we set a/b² = 2|c|. So c = −a/2b². (Without the negative sign the curvatures would match, but the parabola would turn away from the ellipse.)

Similarly, at (−a, 0) we have d = −a and c = a/2b². And at (0, ±b) we have d = ±b and c = ∓b/2a².

Here’s an example with a golden ellipse.

Fitting an ellipse to a parabola

Now we fix the parabola, say

y = cx²

and find an ellipse

(x/a)² + ((y − y₀)/b)² = 1

to fit at the vertex (0, 0). For the ellipse to touch the parabola at its vertex we must have

((0 − y₀)/b)² = 1

and so y₀ = b. To match curvature we have

b/2a² = c.

So a and b are not uniquely determined, only the ratio b/a². As long as this ratio stays fixed at 2c, every ellipse will touch at the vertex and match curvature there. But larger values of the parameters will match the parabola more closely over a wider range. In the limit as b → ∞ (keeping b/a² = 2c), the ellipses become a parabola.

The post Fitting a parabola to an ellipse and vice versa first appeared on John D. Cook.

John D. Cook

Legendre polynomials

Related posts

The biggest perturbation of satellite orbits

More orbital mechanics posts

More Legendre posts

Transmission Obstacles and Ellipsoids

Greater delays

Related posts

Zooming in on a fractalish plot

Most ints are not floats

32 bit details

64 bit details

A note on Python

Related posts

Test whether a large integer is a square

What’s wrong with the old algorithm

A better algorithm

Legendre and Ethereum

Legendre PRF

Python code

Related posts

Patching functions together

Related posts

Log-ish

Uniformity increases entropy

Sinc function approximation

More sinc posts

Arithmetic for fun and profit

Why use hash puzzles for proof-of-work?

Why hash puzzles?

Why not do something useful?

Related posts

Minimize squared relative error

Related posts

What is the Bitcoin proof-of-work problem?

Related posts

Deleting vs Replacing Names

Structured data

Unstructured data

Related posts

Typesetting Sha and Bitcoin

Sha

Bitcoin

Related posts

Golden powers revisited

Related posts

Naively computing sine

Larger arguments

Range reduction

Computing the Euler-Mascheroni Constant

Posts involving γ

Golden ratio base numbers

Scientific papers: innovation … or imitation?

Notes

Binomial number system

Additive and multiplicative persistence

Iterated logarithm

Log star

Python implementation

Mersenne primes

Skewes’ bounds

Iterated iterated logarithm

Related posts

Approximation of Inverse, Inverse of Approximation

Logs and exponents

Inverse function theorem

False witnesses

Fermat’s primality test

Examples

Witnesses

Related posts

Houston’s long term planning

Gardener’s ellipse

From parameters to nails and string

From nails and string to parameters

Related posts

Fitting a parabola to an ellipse and vice versa

Fitting a parabola to an ellipse

Fitting an ellipse to a parabola