A topological sort of a directed graph lists source nodes before target nodes. For example, if there is a directed edge from A to B and from C to A, then the nodes would be list C, A, B. It’s just a way of listing items in a directed graph so that no item in the list points to an item earlier in the list. All arrows point forward. It’s not exotic at all. It’s something you’ve likely done, maybe by hand.

Where does topology come in? Imagine your directed graph made of beads and strings. You want to pick up the graph by some bead so that all beads are higher than the beads they point to. It’s topological in the sense that you don’t need to preserve the geometry of the graph, only its connectivity.

The Unix utility `tsort`

will do a topological sort. The input to the utility is a text file with two items per line, separated by white space, indicating a directed edge from the first item to the second.

Here is a thumbnail image of a graph of relationships between special functions. See this page for a full-sized image and an explanation of what the arrows represent.

I took the GraphViz file used to create the graph and formatted it for `tsort`

. Then I randomly shuffled the file with `shuf`

.

Gegenbauer_polynomials Legendre_polynomials Gegenbauer_polynomials Chebyshev_polynomials_Second_kind Hypergeometric_2F1 Jacobi_polynomials Error_function Fresnel_S ... Hypergeometric_1F1 Error_function

The lines are not sorted topologically because, for example, the Gegenbauer polynomials are special cases of the Hypergeometric 2F1 functions, so Hypergeometric 2F1 should be listed before Gegenbauer polynomials.

When I ran the shuffled file through `tsort`

I got

Elliptic_F Hypergeometric_2F1 Elliptic_E Hypergeometric_1F1 .... Beta

and now in this list more general functions always come before special cases.

- Insertion sort as a fold
- Quick sort and prime numbers
- Six degrees of Kevin Bacon, Paul Erdos, and Wikipedia

[1] After a postdoc at Vanderbilt, I took a job as a programmer. I got the job because they needed a programmer who knew some DSP. A few years later I got a job at MD Anderson Cancer Center managing a group of programmers. It’s fuzzy whether my time at MDACC should be considered time in Academia. My responsibilities there were sometimes academic—writing journal articles, teaching classes—and sometimes not—developing software and managing software developers.

The post Topological sort first appeared on John D. Cook.]]>“We will not sell your personal data, but …

- We might get hacked.
- We might give it to a law enforcement or intelligence agency.
- We might share or trade your data without technically selling it.
- We might alter our terms. Pray we do not alter them any further.
- We might be acquired by a new company that alters the terms.
- We might go bankrupt and the data be sold as an asset.”

(This post started as a Twitter thread. Thanks to Michael Madden for his contribution to the thread.)

The post “We won’t sell your personal data, but …” first appeared on John D. Cook.]]>Here is the constellation using the connections indicated in the IAU star chart.

Here is the constellation using the connections drawn in Rey’s book [2].

Rey’s version adds two stars, highlighted in red, but mostly connects the same stars in a different way. I suppose the herdsman is standing in the IAU version; it’s hard to tell. In Rey’s version, the huntsman is clearly seated and smoking a pipe. This is easier to see if we rotate the image a bit.

Here’s a comparison of the two interpretations side-by-side.

Here is the Python code that produced the two images. It’s a little cleaner than the code in the earlier post, and it draws larger dots to represent brighter stars.

import matplotlib.pyplot as plt # data from https://en.wikipedia.org/wiki/List_of_stars_in_Bo%C3%B6tes α = (14 + 16/60, 19 + 11/60, 0.0) β = (15 + 2/60, 40 + 23/60, 3.5) γ = (14 + 32/60, 38 + 18/60, 3.0) δ = (15 + 16/60, 33 + 19/60, 3.5) ε = (14 + 45/60, 27 + 4/60, 2.3) ζ = (14 + 41/60, 13 + 44/60, 3.8) η = (13 + 55/60, 18 + 24/60, 4.5) θ = (14 + 25/60, 51 + 51/60, 4.0) κ = (14 + 13/60, 51 + 47/60, 4.5) λ = (14 + 16/60, 46 + 5/60, 4.2) μ = (15 + 24/60, 37 + 23/60, 4.3) υ = (13 + 49/60, 15 + 48/60, 4.0) τ = (13 + 47/60, 17 + 27/60, 4.5) ρ = (14 + 32/60, 30 + 22/60, 3.6) k = -15 # reverse and scale horizontal axis def plot_star(s, m): plt.plot(k*s[0], s[1], m, markersize=14-2.2*s[2]) def join(s0, s1, m='ko'): plot_star(s0, m) plot_star(s1, m) plt.plot([k*s0[0], k*s1[0]], [s0[1], s1[1]], 'b-') def draw_iau(): join(α, η) join(η, τ) join(α, ζ) join(α, ϵ) join(ϵ, δ) join(δ, β) join(β, γ) join(γ, λ) join(λ, θ) join(θ, κ) join(κ, λ) join(γ, ρ) join(ρ, α) def draw_rey(): join(α, η) join(η, υ) join(υ, τ) join(α, ζ) join(α, ϵ) join(ζ, ϵ) join(ϵ, δ) join(δ, β) join(δ, μ) join(μ, β) join(β, γ) join(γ, λ) join(λ, θ) join(θ, κ) join(κ, λ) join(γ, ρ) join(ρ, ϵ) plot_star(μ, 'r*') plot_star(υ, 'r*') return draw_iau() plt.gca().set_aspect('equal') plt.axis('off') plt.savefig("bootes_iau.png") plt.close() draw_rey() plt.gca().set_aspect('equal') plt.axis('off') plt.savefig("bootes_rey.png") plt.close()

***

[1] The diaeresis over the second ‘o’ in Boötes means the two vowels are to be pronounced separately: bo-OH-tes. You may have seen the same pattern in Laocoön or oogenesis. The latter is written without a diaresis now, but I bet authors used to write it with a diaeresis on the second ‘o’.

[2] H. A. Rey. The Stars: A New Way to See Them, Second Edition.

The post Connecting the dots differently first appeared on John D. Cook.]]>For this post, I wanted to point out how a couple famous constants are related to the Gumbel distribution.

The standard Gumbel distribution is most easily described by its cumulative distribution function

*F*(*x*) = exp( −exp(−*x*) ).

You can introduce a location parameter μ and scale parameter β the usual way, replacing *x* with (*x* − μ)/β and dividing by β.

Here’s a plot of the density.

The Euler-Mascheroni constant γ comes up frequently in applications. Here are five posts where γ has come up.

- Numbers worth memorizing
- The coupon collector problem
- Pratt prime proofs
- Distribution of Mersenne primes
- Average fraction round up

The constant γ comes up in the context of the Gumbel distribution two ways. First, the mean of the standard Gumbel distribution is γ. Second, the entropy of a standard Gumbel distribution is γ + 1.

The values of the Riemann zeta function ζ(*z*) at positive even integers have closed-form expressions given here, but the values at odd integers do not. The value of ζ(3) is known as Apéry’s constant because Roger Apéry proved in 1978 that ζ(3) is irrational.

Like the Euler-Mascheroni constant, Apéry’s constant has come up here multiple times. Some examples:

The connection of the Gumbel distribution to Apéry’s constant is that the skewness of the distribution is

12√6 ζ(3)/π³.

The post Famous constants and the Gumbel distribution first appeared on John D. Cook.]]>In [1] the author gives two refinements of Markov’s inequality which he calls Hansel and Gretel.

Hansel says

and Gretel says

[1] Joel E. Cohen. Markov’s Inequality and Chebyshev’s Inequality for Tail Probabilities: A Sharper Image. The American Statistician, Vol. 69, No. 1 (Feb 2015), pp. 5-7

The post Strengthen Markov’s inequality with conditional probability first appeared on John D. Cook.]]>I was curious how the book turned out and so I borrowed a copy through my local library. I was thumbing through the book and saw that Eric used **woset** as an abbreviation for well-ordered set. If I had seen that before, it made no impression on me. But since that time I have read There’s a Wocket in My Pocket aloud to four children, and so now my Pavlovian response to hearing “woset” is to think “in my closet.”

The context of the line is

“Did you ever have a feeling there’s a WASKET in your BASKET?

… Or a NUREAU in your BUREAU?

… Or a WOSET in your CLOSET?”

If I remember correctly, Eric started out to write a book about partial differential equations and worked his way backward to foundational theorems from analysis and logic. The end of the book discusses analytic semigroups, important in the theory of parabolic PDEs, but the large majority of the book is a repository of abstract analysis.

The post There’s a woset in my reposit first appeared on John D. Cook.]]>A unit sphere has area 4π. If you’re in a ship far from land, the solid angle of the sky is 2π steradians because it takes up half a sphere.

If the object you’re looking at is a sphere of radius *r* whose center is a distance *d* away, then its apparent size is

steradians. This formula assumes *d* > *r*. Otherwise you’re not looking out at the sphere; you’re *inside* the sphere.

If you’re looking at a star, then *d* is much larger than *r*, and we can simplify the equation above. The math is very similar to the math in an earlier post on measuring tapes. If you want to measure the size of a room, and something is blocking you from measuring straight from wall to wall, it doesn’t make much difference if the object is small relative to the room. It all has to do with Taylor series and the Pythagorean theorem.

Think of the expression above as a function of *r* and expand it in a Taylor series around *r* = 0.

and so

with an error on the order of (*r*/*d*)^{4}. To put it another way, the error in our approximation for Ω is on the order of Ω². The largest object in the sky is the sun, and it has apparent size less than 10^{-4}, so Ω is always small when looking at astronomical objects, and Ω² is negligible.

So for practical purposes, the apparent size of a celestial object is π times the square of the ratio of its radius to its distance. This works fine for star gazing. The approximation wouldn’t be as accurate for watching a hot air balloon launch up close.

Sometimes solid angles are measured in square degrees, given by π/4 times the square of the apparent diameter in degrees. This implicitly uses the approximation above since the apparent radius is *r*/*d*.

(The area of a square is diameter squared, and a circle takes up π/4 of a square.)

When I typed

3.1416 (radius of sun / distance to sun)^2

into Wolfram Alpha I got 6.85 × 10^{-5}. (When I used “pi” rather than 3.1416 it interpreted this as the radius of a pion particle.)

When I typed

3.1416 (radius of moon / distance to moon)^2

I got 7.184 × 10^{-5}, confirming that the sun and moon are approximately the same apparent size, which makes a solar eclipse possible.

The brightest star in the night sky is Sirius. Asking Wolfram Alpha

3.1416 (radius of Sirius / distance to Sirius)^2

we get 6.73 × 10^{-16}.

and characteristic function sech(*t*). It’s curious that the density and characteristic function are so similar.

The characteristic function is essentially the Fourier transform of the density function, so this says that the hyperbolic secant function, properly scaled, is a fixed point of the Fourier transform. I’ve long known that the normal density is its own Fourier transform, but only recently learned that the same is true of the hyperbolic secant.

The Hermite functions are also fixed points of the Fourier transform, or rather eigenfuctions of the Fourier transform. The eigenvalues are 1, *i*, -1, and *i*. When the eigenvalues are 1, we have fixed points.

There are two conventions for defining the Hermite functions, and multiple conventions for defining the Fourier transform, so the truth of the preceding paragraph depends on the conventions used.

For this post, we will define the Fourier transform of a function *f* to be

Then the Fourier transform of exp(-*x*²/2) is the same function. Since the Fourier transform is linear, this means the same holds for the density of the standard normal distribution.

We will define the Hermite polynomials by

using the so-called physics convention. *H*_{n} is an *n*th degree polynomial.

The Hermite functions ψ_{n}(*x*) are the Hermite polynomials multiplied by exp(-*x*²/2). That is,

With these definitions, the Fourier transform of ψ_{n}(*x*) equals (-*i*)^{n} ψ_{n}(*x*). So when *n* is a multiple of 4, the Fourier transform of ψ_{n}(*x*) is ψ_{n}(*x*).

[The definition Hermite functions above omits a complicated constant term that depends on *n* but not on *x*. So our Hermite functions are proportional to the standard Hermite functions. But proportionality constants don’t matter when you’re looking for eigenfunctions or fixed points.]

Using the definition of Fourier transform above, the function sech(√(π/2) *x*) is its own Fourier transform.

This is surprising because the Hermite functions form a basis for *L*²(ℝ), and all have tails on the order of exp(-*x*²), but the hyperbolic secant has tails like exp(-*x*). Each Hermite function *eventually* decays like exp(-*x*²), but this happens later as *n* increases, so an infinite sum of Hermite functions can have thicker tails than any particular Hermite function.

Ding’s paper contains a plot comparing the density functions for the hyperbolic secant distribution, the standard normal distribution, and the logistic distribution with scale √3/π. The scale for the logistic was chosen so that all three distributions would have variance 1.

There’s something interesting about comparing logistic distribution and the hyperbolic secant distribution densities: the former is the square of the latter, aside from some scaling, and yet the two functions are similar. You don’t often approximate a function by its square.

Here’s a plot of the two densities.

The hyperbolic secant density, the blue curve, crosses the logistic density around ± 0.56 and around ± 2.33.

The hyperbolic secant distribution has density

and the logistic distribution, as scaled in above, has density

and so

[1] Peng Ding. Three Occurrences of the Hyperbolic-Secant Distribution. The American Statistician , Feb 2014, Vol. 68, No. 1 (2014), pp. 32-35

The post Hyperbolic secant distribution first appeared on John D. Cook.]]>I was curious how many of the 676 possible two-letter combinations are used by the abbreviation systems above. About two thirds, not as many as I expected.

There are 798 abbreviations in the lists mentioned, but a lot of overlap. For example, FR represents the country France, the language French, and the chemical element Francium.

There are five abbreviations that are part of all five lists: GA, LA, MT, NE, and PA.

- GA (Gabon, Irish, gallium, Georgia, cartography)
- LA (Lao People’s Democratic Republic, Latin, lanthanum, Louisiana, history of education)
- MT (Malta, Maltese, meitnerium, Montana, music instruction)
- NE (Niger, Nepali, neon, Nebraska, print media)
- PA (Panama, Punjabi, protactinium, Pennsylvania, Greek and Latin language and literature)

is not in general a rotation matrix.

You can represent rotations with unit quaternions rather than orthogonal matrices (see details here), so a reasonable approach might be to interpolate between the rotations represented by unit quaternions *q*_{1} and *q*_{2} using

but this has a similar problem: the quaternion above is not a *unit* quaternion.

One way to patch this up would be to normalize the expression above, dividing by its norm. That would indeed produce *unit* quaternions, and hence correspond to rotations. However, uniformly varying *t* from 0 to 1 does not produce a uniform rotation.

The solution, first developed by Ken Shoemake [1], is to use **spherical linear interpolation** or **SLERP**.

Let θ be the angle between *q*_{1} and *q*_{2}. Then the spherical linear interpolation between *q*_{1} and *q*_{2} is given by

Now *q*(*t*) is a unit quaternion, and uniformly increasing *t* from 0 to 1 creates a uniform rotation.

[1] Ken Shoemake. “Animating Rotation with Quaternion Curves.” SIGGRAPH 1985.

The post Interpolating rotations with SLERP first appeared on John D. Cook.]]>The shuffle product of two words, *w*_{1} and *w*_{2}, written

*w*_{1} Ш *w*_{2},

is the set of all words formed by the letters in *w*_{1} and *w*_{2}, preserving the order of each word’s letters. The name comes from the analogy with doing a riffle shuffle of two decks of cards.

For example, *bcd* Ш *ae*, the shuffle product of *bcd* and *ae*, would be all permutations of *abcde* in which the consonants appear in alphabetical order and the vowels are also in alphabetical order. So *abecd* and *baecd* would be included, but *badec* would not be because the *d* and *c* are in the wrong order.

Incidentally, the symbol for shuffle product is the Cyrillic letter sha (Ш, U+0428), the only Cyrillic letter commonly used in mathematics, at least internationally. Presumably Russian mathematicians use other Cyrillic letters, but the only Cyrillic letter an American mathematician, for example, is likely to use is Ш.

The uses of Ш that I’m aware of are the Dirac comb distribution, the Tate–Shafarevich group, and the shuffle product.

What is the shuffle product of words containing duplicate letters? For example, what about the shuffle product of *bread* and *crumb*? Each word contains an *r*. The shuffle product, defined above as a set, doesn’t distinguish between the two *r*s. But another way to define the shuffle product is as a formal sum, with coefficients that count duplicates.

Imagine coloring the letters in *abc* blue and the letters in *cde* red. Then *abc**cde* and *ab**c**c**de* would count as two different possibilities, one with blue *c* followed by red *c*, and one the other way around. This term in the formal sum would be 2*abccde*, the two capturing that there are two ways to arrive at this word.

You could also have duplicate letters within a single word. So in *banana*, for example, you could imagine coloring each *a* a different color and coloring the two *n*s different colors.

This page gives an implementation of the shuffle product in Mathematica.

shuffleW[s1_, s2_] := Module[{p, tp, ord}, p = Permutations@Join[1 & /@ s1, 0 & /@ s2]\[Transpose]; tp = BitXor[p, 1]; ord = Accumulate[p] p + (Accumulate[tp] + Length[s1]) tp; Outer[Part, {Join[s1, s2]}, ord, 1][[1]]\[Transpose]]

This code takes two lists of characters and returns a list of lists of characters. You can use this to compute both senses of the shuffle product. For example, let’s compute *abc* Ш *ac*.

The Mathematica command

shuffleW[{a, b, c}, {a, c}]

returns a list of 10 lists:

{{a, b, c, a, c}, {a, b, a, c, c}, {a, b, a, c, c}, {a, a, b, c, c}, {a, a, b, c, c}, {a, a, c, b, c}, {a, a, b, c, c}, {a, a, b, c, c}, {a, a, c, b, c}, {a, c, a, b, c}}

If we ask for the union of the set above with `Union[%]`

we get

{{a, a, b, c, c}, {a, a, c, b, c}, {a, b, a, c, c}, {a, b, c, a, c}, {a, c, a, b, c}}

So using the set definition, we could say

*abc* Ш *ac* = {*aabcc*, *aacbc*, *abacc*, *abcac*, *acabc*}.

Using the formal sum definition we could say

*abc* Ш *ac* = *4aabcc* + 2*aacbc* + 2*abacc* + *abcac* + *acabc*.

Photo by Amol Tyagi on Unsplash

The post Shuffle product first appeared on John D. Cook.]]>If prime numbers were samples from a random variable, it would be natural to look into the mean and variance of that random variable. We can’t just compute the mean of all primes, but we can compute the mean and variance of all primes less than an upper bound *x*.

Let *M*(*x*) be the mean of all primes less than *x* and let *V*(*x*) be the corresponding variance. Then we have the following asymptotic results:

*M*(*x*) ~ *x* / 2

and

*V*(*x*) ~ *x*²/12.

We can investigate how well these limiting results fit for finite *x* with the following Python code.

from sympy import sieve def stats(x): s = 0 ss = 0 count = 0 for p in sieve.primerange(x): s += p ss += p**2 count += 1 mean = s / count variance = ss/count - mean**2 return (mean, variance)

So, for example, when *x* = 1,000 we get a mean of 453.14, a little less than the predicted value of 500. We get a variance of 88389.44, a bit more than the predicted value of 83333.33.

When *x* = 1,000,000 we get closer to values predicted by the limiting formula. We get a mean of 478,361, still less than the prediction of 500,000, but closer. And we get a variance of 85,742,831,604, still larger than the prediction 83,333,333,333, but again closer. (Closer here means the ratios are getting closer to 1; the absolute difference is actually getting larger.)

Taylor’s law is named after ecologist Lionel Taylor (1924–2007) who proposed the law in 1961. Taylor observed that variance and mean are often approximately related by a power law independent of sample size, that is

*V*(*x*) ≈ *a* *M*(*x*)^{b}

independent of *x*.

Taylor’s law is an empirical observation in ecology, but it is a theorem when applied to the distribution of primes. According to the asymptotic results above, we have *a* = 1/3 and *b* = 2 in the limit as *x* goes to infinity. Let’s use the code above to look at the ratio

*V*(*x*) / *a* *M*(*x*)^{b}

for increasing values of *x*.

If we let *x* = 10^{k} for *k* = 1, 2, 3, …, 8 we get ratios

0.612, 1.392, 1.291, 1.207, 1.156, 1.124, 1.102, 1.087

which are slowly converging to 1.

- How long does it take to find large primes?
- Refinements to the prime number theorem
- Prime numbers, phone numbers, and the normal distribution

Reference: Joel E. Cohen. Statistics of Primes (and Probably Twin Primes) Satisfy Taylor’s Law from Ecology. The American Statistician, Vol. 70, No. 4 (November 2016), pp. 399–404

The post Prime numbers and Taylor’s law first appeared on John D. Cook.]]>We can print out the first few digits of π and see that there’s no 0 until the 32nd decimal place.

3.14159265358979323846264338327950

It’s easy to verify that the remaining digits occur before the 0, so the answer is 32.

Now suppose we want to look at *pairs* of digits. How far out do we have to go until we’ve seen all pairs of digits (or base 100 digits if you want to think of it that way)? And what about triples of digits?

We know we’ll need at least 100 pairs, and at least 1000 triples, so this has gotten bigger than we want to do by hand. So here’s a little Python script that will do the work for us.

from mpmath import mp mp.dps = 30_000 s = str(mp.pi)[2:] for k in [1, 2, 3]: tuples = [s[i:i+k] for i in range(0, len(s), k)] d = dict() i = 0 while len(d) < 10**k: d[tuples[i]] = 1 i += 1 print(i)

The output:

32 396 6076

This confirms that we at the 32nd decimal place we will have seen all 10 possible digits. It says we need 396 pairs of digits before we see all 100 possible digit pairs, and we’ll need 6076 triples before we’ve seen all possible triples.

We could have used the asymptotic solution to the “coupon collector problem” to approximately predict the results above.

Suppose you have an urn with *n* uniquely labeled balls. You randomly select one ball at a time, return the ball to the run, and select randomly again. The coupon collector problem ask how many draws you’ll have to make before you’ve selected each ball at least once.

The expected value for the number of draws is

*n H_{n}*

where *H*_{n} is the *n*th harmonic number. For large *n* this is approximately equal to

*n*(log *n* + γ)

where γ is the Euler-Mascheroni constant. (More on the gamma constant here.)

Now assume the digits of π are random. Of course they’re not random, but random is as random does. We can get useful estimates by making the modeling assumption that the digits behave like a random sequence.

The solution to the coupon collector problem says we’d expect, on average, to sample 28 digits before we see each digit, 518 pairs before we see each pair, and 7485 triples before we see each triple. “On average” doesn’t mean much since there’s only one π, but you could interpret this as saying what you’d expect if you repeatedly chose real numbers at random and looked at their digits, assuming the normal number conjecture.

The variance on the number of draws needed is asymptotically π² *n*²/6, so the number of draws with usually be an interval of the expected value ± 2*n*.

If you want the details of the coupon collector problem, not just the expected value but the probabilities for different number of draws, see Sampling with replacement until you’ve seen everything.

The post The coupon collector problem and π first appeared on John D. Cook.]]>

The **International Astronomical Union** (IAU) makes beautiful star charts of the constellations, and uses Rey’s conventions, *sorta*.

This post will look at the example of Leo, from the IAU chart and from Rey’s book Find The Constellations.

(I wonder whether the ancients also added stars to what we received as the traditional versions of constellations. Maybe they didn’t consciously notice the other stars. Or maybe they did, but only saw the need to record the brightest stars, something like the way Hebrew only recorded the consonants of words.)

Here is the IAU star chart for Leo, cropped to just show the constellation graph. (The white region is Leo-as-region and the green lines are Leo-as-graph.)

Rey’s version of Leo is a little different. Here is my attempt to reproduce Rey’s version from page 9 of Find the Constellations.

And for comparison, here’s my reproduction of the IAU version.

The solid blue lines are traditional. The dashed green lines were added by Rey and the IAU respectively.

Here is the Python code that produced the two images. Star names and coordinates are explained in the previous post.

# data from https://en.wikipedia.org/wiki/List_of_stars_in_Leo import matplotlib.pyplot as plt # star coordinates δ = (11 + 14/60, 20 + 41/60) β = (11 + 49/60, 14 + 34/60) θ = (11 + 14/60, 15 + 26/60) α = (10 + 8/60, 11 + 58/60) η = (10 + 7/60, 16 + 46/60) γ = (10 + 20/60, 19 + 51/60) ζ = (10 + 17/60, 23 + 25/60) μ = ( 9 + 53/60, 26 + 0/60) ε = ( 9 + 46/60, 23 + 46/60) κ = ( 9 + 25/60, 26 + 11/60) λ = ( 9 + 32/60, 22 + 58/60) ι = (11 + 24/60, 10 + 32/60) σ = (11 + 21/60, 6 + 2/60) ο = ( 9 + 41/60, 9 + 54/60) ρ = (10 + 33/60, 9 + 18/60) k = -20 # reverse and scale horizontal axis def plot_stars(ss): for s in ss: plt.plot(k*s[0], s[1], 'ko') def join(s0, s1, style, color): plt.plot([k*s0[0], k*s1[0]], [s0[1], s1[1]], style, color=color) def draw_iau(): plot_stars([δ,β,θ,α,η,γ,ζ,μ,ε,κ,λ,ι,σ]) # traditional join(δ, β, '-', 'b') join(β, θ, '-', 'b') join(θ, η, '-', 'b') join(η, γ, '-', 'b') join(γ, ζ, '-', 'b') join(ζ, μ, '-', 'b') join(μ, ε, '-', 'b') join(δ, θ, '-', 'b') # added join(θ, ι, '--', 'g') join(ι, σ, '--', 'g') join(δ, γ, '--', 'g') join(ε, η, '--', 'g') join(μ, κ, '--', 'g') join(κ, λ, '--', 'g') join(λ, ε, '--', 'g') join(η, α, '--', 'g') def draw_rey(): plot_stars([δ,β,θ,α,η,γ,ζ,μ,ε,λ,ι,σ, ρ,ο]) # traditional join(δ, β, '-', 'b') # join(β, θ, '-', 'b') join(θ, η, '-', 'b') join(η, γ, '-', 'b') join(γ, ζ, '-', 'b') join(ζ, μ, '-', 'b') join(μ, ε, '-', 'b') join(δ, θ, '-', 'b') # added join(θ, ι, '--', 'g') join(ι, σ, '--', 'g') join(δ, γ, '--', 'g') join(λ, ε, '--', 'g') join(η, α, '--', 'g') join(λ, ε, '--', 'g') join(θ, ρ, '--', 'g') join(η, ο, '--', 'g')The post Adding stars to constellations first appeared on John D. Cook.]]>

When you look up data on stars in constellations you run into two meanings of constellation. For example, Leo is a region of the night sky containing an untold number of stars. It is also a pattern of nine particular stars connected by imaginary lines. It’s easier to find data on the former, say sorted by brightness.

Are the nine brightest stars in Leo-the-region the nine stars of Leo-the-stick-figure? Not exactly, but close.

Wikipedia has an article that list stars in each constellation region, and star charts that have constellations as stick figures. If the stars on the chart are labeled, you can cross reference them with Wikipedia.

On a particular star chart I have, the stars in Leo are labeled with their Bayer designation. Roughly speaking the Bayer designation labels the stars within a constellation with Greek letters in descending order of brightness, but there are inconsistencies. The nomenclature goes back to Johann Bayer (1572–1625) and has its flaws.

The stars in Leo, in line-drawing order, are

- δ Leo
- Denebola (β Leo)
- θ Leo
- Regulus (α Leo)
- η Leo
- γ Leo
- ζ Leo
- μ Leo
- ε Leo

You can look up the coordinates of these stars here. Line-drawing order does not correspond to brightness order, so without a labeled star chart you’d have some research to do. My chart labels all the stars in Leo (the stick figure), but not, for example, in Virgo.

γ Leo is actually two stars, and Wikipedia ranks the brightness of the stars a little differently than Bayer did, which is understandable since brightness could not be objectively measured in his day. Wikipedia also inserts a few stars in between the stars listed above.

Here’s a plot of Leo using the data referenced above.

The post Plotting constellations first appeared on John D. Cook.]]>In 1881, astronomer Simon Newcomb noticed something curious. The first pages in books of logarithms were dirty on the edge, while the pages became progressively cleaner in later pages. He inferred from this that people more often looked up the logarithms of numbers with small leading digits than with large leading digits.

Why might this be? One might reasonably expect the numbers that came up in work to be uniformly distributed. But as often the case, it helps to ask “Uniform on what scale?”

Newcomb might have imagined his counterpart on another planet. This alien astronomer might have 12 fingers [1] and count in base 12. Base 10 is not inevitable, even for creatures with 10 fingers: the ancient Sumerians used a base-60 number system.

If Newcomb’s twelve-fingered counterpart had developed logarithms but not digital computers, he might have tables of duodecimal logarithms bound into books, and he too might noticed that pages with small leading (duo)digits are more frequently referenced. Both astronomers would naturally look up the logarithms of physical constants, physical distances, and so fort, numbers that vary over a practically unlimited range. The unlimited range is important.

On what scale could both astronomers see the leading digits uniformly distributed?

If Newcomb needed to look up the logarithms of numbers over a limited range, say from 1 to 10^{6}, each with equal probability, then the leading digits would be uniformly distributed. But our alien astronomer would have no special interest in the number 10^{6}. He might want to look at numbers between 1 and 12^{6}. The leading digits of numbers over this range would be uniformly distributed when represented in base 12, but not when represented in base 10. The choice of upper limit introduces a bias in one base or another.

Now suppose the numbers that both astronomers used in their work were uniformly distributed on a logarithmic scale. Newcomb conjectured that the numbers that came up in practice were uniformly distributed in their logarithms base 10. Our alien astronomer might conjecture the same thing for logarithms base 12. And both could be right. So would a third astronomer working in base 42. All logarithms are proportional, and so numbers uniformly distributed on a log scale using one base are uniformly distributed on a log scale using any other base.

Benford’s law says that the leading digits of numbers that come up in practice are uniformly distributed on a log scale. This applies to base 10, but also any other base, such as base 100. If you looked at the first two digits and thought of them as single base-100 digits, Benford’s law still applies.

But who is Benford? True to Stigler’s law of eponymy, Newcomb’s observation is named after physicist Frank Benford who independently made the same observation in 1938 and who tested it more extensively.

Let’s look at a set of physical constants and see how well Benford’s law applies. I took at list of physical constants from NIST and made a histogram of the leading digits to compare with what one would expect from Benford’s law.

If one were to write the NIST constants in base 12 and repeat the exercise, the result would look similar.

[1] The image at the top of the post was created by DALL-E. There is a slight hint of an extra finger. DALL-E usually has a hard problem with hands, adding or removing fingers. But my attempts to force it to draw a hand with an extra finger were not successful.

The post Alien astronomers and Benford’s law first appeared on John D. Cook.]]>I watched Red October this evening, for the first time since around the time it came out in 1990, and was surprised by a detail in one of the scenes. I recognized one of the books: Dutton’s Navigation and Piloting.

I have a copy of that book, the 14th edition. The spine looks exactly the same. The first printing was in 1985, and I have have the second printing from 1989. So it is probably the same edition and maybe even the same printing as in the movie. I bought the book last year because it was recommended for something I was working on. Apparently it’s quite a classic since someone thought that adding a copy in the background would help make a realistic set for a submarine.

My copy has a gold sticker inside, indicating that the book came from Fred L. Woods Nautical Supplies, though I bought my copy used from Alibris.

Here’s a clip from the movie featuring Dutton’s.

Dutton’s has a long history. From the preface:

Since the first edition of

Navigation and Nautical Astronomy(as it was then titled) was written by Commander Benjamin Dutton, U. S. Navy, and published in 1926, this book has been updated and revised. The title was changed after his death to more accurately reflect its focus …

The 14th edition contains a mixture of classical and electronic navigation, navigating by stars and by satellites. It does not mention GPS; that is included in the latest edition, the 15th edition published in 2003.

An isomorphism is a structure-preserving function from one object to another. In the context of graphs, an isomorphism is a function that maps the vertices of one graph onto the vertices of another, preserving all the edges.

So if *G* and *H* are graphs, and *f* is an isomorphism between *G* and *H*, nodes *x* and *y* are connected in *G* if and only if nodes *f*(*x*) and *f*(*y*) are connected in *H.*

There are 30 basketball teams in the National Basketball Association (NBA) and 30 baseball teams in Major League Baseball (MLB). That means the NBA and MLB are isomorphic as *sets*, but it doesn’t necessarily mean that the hierarchical structure of the two organizations are the same. But in fact the hierarchies are the same.

Both the NBA and MLB have two top-level divisions, each divided into three subdivisions, each containing five teams.

Basketball has an Eastern Conference and a Western Conference, whereas baseball has an American League and a National League. Each basketball conference is divided into three divisions, just like baseball leagues, and each division has five teams, just as in baseball. So the tree structures of the two organizations are the same.

In the earlier post about the MLB tree structure, I showed how you could number baseball teams so that the team number *n* could tell you the league, division, and order within a division by taking the remainders when *n* is divided by 2, 3, and 5. Because the NBA tree structure is isomorphic, the same applies to the NBA.

Here’s a portion of the graph with numbering. The full version is available here as a PDF.

Here’s the ordering.

- Los Angeles Clippers
- Miami Heat
- Portland Trail Blazers
- Milwaukee Bucks
- Dallas Mavericks
- Brooklyn Nets
- Los Angeles Lakers
- Orlando Magic
- Utah Jazz
- Chicago Bulls
- Houston Rockets
- New York Knicks
- Phoenix Suns
- Washington Wizards
- Denver Nuggets
- Cleveland Cavaliers
- Memphis Grizzlies
- Philadelphia 76ers
- Sacramento Kings
- Atlanta Hawks
- Minnesota Timberwolves
- Detroit Pistons
- New Orleans Pelicans
- Toronto Raptors
- Golden State Warriors
- Charlotte Hornets
- Oklahoma City Thunder
- Indiana Pacers
- San Antonio Spurs
- Boston Celtics

Incidentally, the images at the top of the post were created with DALL-E. They look nice overall, but you’ll see bizarre details if you look too closely.

The post The NBA and MLB trees are isomorphic first appeared on John D. Cook.]]>Last week I wrote about how to number MLB teams so that the number

*n*% 2 tells you the league, American or National*n*% 3 tells you the division: East, Central, or West*n*% 5 is unique within a league/division combination.

Here *n* % *m* denotes *n* mod *m*, the remainder when *n* is divided by *m*.

This post will do something similar for minor league teams.

There are four minor league teams associated with each major league team. If we wanted to number them analogously, we’d need to do something a little different because we cannot specify *n* % 2 and *n* % 4 independently. We’d need an approach that is a hybrid of what we did for the NFL and MLB.

We could specify the league and the rank within the minor leagues by three bits: one bit for National or American league, and two bits for the rank:

- 00 for A
- 01 for High A
- 10 for AA
- 11 for AAA

It will be convenient later on if we make the ranks the most significant bits and the league the least significant bit.

So to place a minor league team on a list, we could write down the numbers 1 through 120, and for each *n*, calculate *r* = *n* % 8, *d* = *n* % 3, and *k* = *n* % 5.

The latest episode of 99% Invisible is called RoboUmp, a show about automating umpire calls. As part of the story, the show discusses the whimsical names of minor league teams and how the names allude to their location. For example, the El Paso Chihuahuas are located across the border from the Mexican state of Chihuahua and their mascot is a chihuahua dog. (The dog was named after the state.)

The El Paso Chihuahuas are the AAA team associated with the San Diego Padres, a team in the National League West, team #3 in the order listed in the MLB post. The number *n* for the Chihuahuas must equal 7 mod 8, 111_{two}, the first bit for National League and the last two bits for AAA. We also require *n* to be 2 mod 3 because it’s in the West, and *n* = 3 mod 5 because the Padres are #3 in the list of National League West teams in our numbering. It works out that *n* = 23.

How do minor league and major league numbers relate? They have to be congruent mod 30. They have to have the same parity since they represent the same league, and must be congruent mod 3 because they have in the same division. And they must be congruent mod 5 to be in the same place in the list of associated major league teams.

So to calculate a minor league team’s number, start with the corresponding major league number, and add multiples of 30 until you get the right value mod 8.

For example, the Houston Astros are number 20 in the list from the earlier post. The Triple-A team associated with the Astros is the Sugar Land Space Cowboys. The number *n* for the Space Cowboys must be 6 mod 8 because 6 = 110_{two}, and they’re a Triple-A team (11) in the American League (0). So *n* = 110.

The Astros’ Double-A team, the Corpus Christi Hooks, needs to have a number equal to 100_{two} = 4 mod 8, so *n* = 20. The High-A team, the Asheville Tourists, are 50 and the Single-A team, the Fayetteville Woodpeckers, is 80.

You can determine what major league team is associated with a minor league team by taking the remainder by 30. For example, the Rocket City Trash Pandas has number 77, so they’re associated with the major league team with number 17, which is the Los Angeles Angels. The remainder when 77 is divided by 8 is 5 = 101_{two}, which tells you they’re a Double-A team since the high order bits are 1 and 0.

Say I make a rule for testing whether a number is divisible by 59. That’s great, if you routinely need to test divisibility by 59. Maybe you work for a company that, for some bizarre reason, ships widgets in boxes of 59 and you frequently have to test whether numbers are multiples of 59.

When you want to factor numbers, you’d like to test divisibility by a **set** of primes at once, using fewer separate algorithms, and taking advantage of work you’ve already done.

John Conway came up with his 150 Method to test for divisibility by a sequence of small primes. This article explains how Conway’s 150 method and a couple variations work. The core idea behind Conway’s 150 Method, his 2000 Method, and analogous methods developed by others is this:

- Find a range of integers, near a round number, that contains a lot of distinct prime factors.
- Reduce your number modulo the round number, then test for divisibility sequentially, reusing work.

Conway’s 150 Method starts by taking the quotient and remainder by 150. And you’ll never guess what his 2000 Method does. :)

This post will focus on the pattern behind Conway’s method, and similar methods. For examples and practical tips on carrying out the methods, see the paper linked above and a paper I’ll link to below.

Conway exploited the fact that the numbers 152 through 156 are divisible by a lot of primes: 2, 3, 5, 7, 11, 13, 17, 19, and 31.

He starts his method with 150 rather than 152 because 150 is a round number and easier to work with. We start by taking the quotient and remainder by 150.

Say *n* = 150*q* + *r*. Then *n* – 152*q* = *r* – 2*q*. If *n* has three or four digits, *q* only has one or two digits, and so subtracting *q* is relatively easy.

Since 19 divides 152, we can test whether *n* is divisible by 19 by testing whether *r* – 2*q* is divisible by 19.

The next step is where sequential testing saves effort. Next we want to subtract off a multiple of 153 to test for divisibility by 17, because 17 divides 153. But we don’t have to start over. We can reuse our work from the previous step.

We want *n* – 153*q* = (*n* – 152*q*) – *q*, and we’ve already calculated *n* – 152*q* in the previous step, so we only need to subtract *q*.

The next step is to find *n* – 154*q*, and that equals (*n* – 153*q*) – *q*, so again we subtract *q* from the result of the previous step. We repeat this process, subtracting *q* each time, and testing for divisibility by a new set of primes each time.

Conway’s more extensive method exploited the fact that the numbers 1998 through 2021 are divisible by all primes up to 67. So he would start by taking the quotient and remainder by 2000, which is really easy to do.

Say *n* = 2000*q* + *r*. Then we would add (or subtract) *q* each time.

You could start with *r*, then test *r* for divisibility by the factors of 2000, then test *r* – *q* for divisibility by the factors of 2001, then test *r* – 2*q* for divisibility by the factors of 2002, and so on up to testing *r* – 21*q* for divisibility by the factors of 2021. Then you’d need to go back and test *r* + *q* for divisibility by the factors of 1999 and test *r* + 2*q* for divisibility by the factors of 1998.

In principle that’s how Conways 2000 Method works. In practice, he did something more clever.

Most of the prime factors of the numbers 1998 through 2021 are prime factors of 1998 through 2002, so it makes sense to test this smaller range first hoping for early wins. Also, there’s no need to test divisibility by the factors of 1999 because 1999 is prime.

Conway tested *r* – *kq* for *k* = -2 through 21, but not sequentially. He would try out the values of *k* in an order most likely to terminate the factoring process early.

This paper gives a **much** more extensive approach to mental factoring than Conway’s 150 method. The authors, Hilarie Orman and Richard Schroeppel, outline a strategy for factoring any six-digit number. Conway’s rule is more modest, intended for three and four digit numbers.

Orman and Schroeppel suggest a sequence of factoring methods, including more advanced techniques to use after you’ve tried testing for divisibility by small primes. One of the techniques in the paper might be called the 10,000 Method by analogy to Conway’s method, though the authors don’t call it that. They call it “check the *m*‘s” for reasons that make more sense if you read the paper.

The 10,000 Method is much like the 2000 Method. The numbers 10,001 through 10,019 have a lot of prime factors, and the method tests for divisibility by these factors sequentially, taking advantage of previous work at each step, just as Conway’s methods do. The authors do not backtrack the way Conway did; they test numbers in order. However, they do skip over some numbers, like Conway skipped over 1999.

Like the NFL, MLB teams are organized into a nice tree structure, though the MLB tree is a little more complicated. There are 32 NFL teams organized into a complete binary tree, with a couple levels collapsed. There are 30 MLB teams, so the tree structure has to be a bit different.

MLB has **leagues** rather than conferences, but the top-level division is into American and Nation as with the NFL. So the top division is into the American League and the National League.

And as with football, the next level of the hierarchy is **divisions**. But baseball has three divisions—East, Central, and West—in contrast to four in football.

Each division has five baseball teams, while each football division has four teams.

Here’s the basic tree structure.

Under each division are five teams. Here’s a PDF with the full graph including teams.

How do the division names correspond to actual geography?

Within each league, the Central teams are to the west of the East teams and to the east of the West teams, with one exception: in the National League, the Pittsburgh Pirates are a Central division team, but they are east of the Atlanta Braves and Miami Marlins in the East division. But essentially the East, Central, and West divisions do correspond to geographic east, center, and west, within a league.

We can’t number baseball teams as elegantly as the previous post numbered football teams. We’d need a mixed-base number. The leading digit would be binary, the next digit base 3, and the final digit base 5.

We could number the teams so that you could tell the league and division of the team by looking at the remainders when the number is divided by 2 and 3, and each team is unique mod 5. By the Chinese Remainder Theorem, we can solve the system of congruence equations mod 30 that specify the value of a number mod 2, mod 3, and mod 5.

If we number the teams as follows, then odd numbered teams are in the American League and even numbered teams are in the National League. When the numbers are divided by 3, those with remainder 0 are in an Eastern division, those with remainder 1 are in a Central division, and those with remainder 2 are in a Western division. Teams within the same league and division have unique remainders by 5.

- Cincinnati Reds
- Oakland Athletics
- Philadelphia Phillies
- Minnesota Twins
- Arizona Diamondbacks
- Boston Red Sox
- Milwaukee Brewers
- Seattle Mariners
- Washington Nationals
- Chicago Whitesocks
- Colorado Rockies
- New York Yankees
- Pittsburgh Pirates
- Texas Rangers
- Atlanta Braves
- Cleveland Guardians
- Los Angeles Dodgers
- Tampa Bay Rays
- St. Louis Cardinals
- Houston Astros
- Miami Marlins
- Detroit Tigers
- San Diego Padres
- Toronto Blue Jays
- Chicago Cubs
- Los Angeles Angels
- New York Mets
- Kansas City Royals
- San Francisco Giants
- Baltimore Orioles

The NFL has a very nice tree structure, which isn’t too surprising in light of the need to make tournament brackets. The NFL is divided into two **conferences**, the American Football Conference and the National Football Conference.

Each conference is divided into four **divisions** named after geographical regions. Since this is a mathematical post, I’ve listed the regions counterclockwise starting in the east because that’s how mathematicians do things.

Each division has four teams. Adding each team under its division would make an awkwardly wide graph. I made a graph of the entire tree, rotated so that image is long rather than wide. Here’s a little piece of it.

The full image is available here.

Now you may wonder how well the geographic division names correspond to geography. For example, the Dallas Cowboys are in the NFC East, and it’s a little jarring to hear Texas called “east.”

But within each conference, all the “East” teams are indeed east of all the West teams. And with one exception, all the North teams are indeed north of the South teams. The Indianapolis Colts are the exception. The Colts are in the AFC South, but are located to the north of the Cincinnati Bengals and the Baltimore Ravens in the AFC North.

This geographical sorting only applies within a conference. The Dallas Cowboys, for example are east of all the West teams within their conference, but they are west of the Kansas City Chiefs in the AFC West.

Here’s where topology comes in: you can make the division names match their geography if you morph the map of the United States pulling Indianapolis south of its geometric location.

The graph structure of the NFL is essentially a full binary tree; you could make it into a binary tree by introducing a sub-conference layer and grouping the teams into pairs.

You could number the NFL teams with five bits: one for the conference, two for the division, and two more for the team. We could make the leading bit 0 for the AFC and 1 for the NFC. Then within each division, we could use 00 for East, 01 for North, 10 for West, and 11 for South. As mentioned above, this follows the mathematical convention of angles increasing counterclockwise starting at the positive *x*-axis.

The table above is an SVG image; here is the same data in plain text.

Smith’s book says Arthur Benjamin squares large numbers using the formula

*n*² = (*n* + *a*)(*n* − *a*) + *a*²

where *a* is chosen to make the multiplication easier, i.e. to make *n* + *a* or *n* – *a* a round number. The method is then applied recursively to compute *a*², and the process terminates when you get to a square you have memorized. There are nuances to using this method in practice, but that’s the core idea.

*The Great Mental Calculators* was written in 1983 when Benjamin was still a student. He is now a mathematics professor, working in combinatorics, and is also well known as a mathemagician.

Smith quotes Benjamin giving an example of how he would square 4273. Along the way he needs to remember 184 as an intermediate result. He says

The way I remember it is by converting 184 to the word ‘dover’ using the phonetic code.

I found this interesting because I had not heard of anyone using the Major system (“the phonetic code”) in real time. This system is commonly used to commit numbers to long-term memory, but you’d need to be very fluent in the system to encode and decode a number in the middle of a calculation.

Maybe a lot of mental calculators use the Major system, or some variation on it, during calculations. Most calculators are not as candid as Benjamin in explaining how they think.

Here’s a plot of the sinc function and its first two derivatives.

Thomas Grönwall proposed a problem to the American Mathematical Monthly in 1913 [1] bounding the derivatives of the sinc function:

Seven years later, Dunkel gave an elegant proof. Perhaps Grönwall had a proof and was proposing his inequality as a challenge, or maybe it was a conjecture at the time he published it. In any case, the proof by Dunkel is very nice. He sets

*y* = sin(*x*)/*x*

and repeatedly differentiates both sides of the equation

*xy* = sin(*x*).

See the details in Dunkel’s solution.

I don’t know of an application of Grönwall’s inequality offhand. But the sinc function is common in signal processing, and so maybe his inequality has applications there.

(By “Grönwall’s inequality” I mean the inequality above. There is another theorem also known as Grönwall’s inequality that is commonly applied in differential equations.)

[1] Gronwall, T. H. (1913). Problem 339. Amer. Math. Monthly. 20: 196.

[2] Dunkel, O., (1920). Solution to Problem 339. Amer. Math. Monthly. 27: 81–85.

The post Bounding derivatives of the sinc function first appeared on John D. Cook.]]>Napoleaon’s theorem says that if you start with any triangle, and attach equilateral triangles to each side, the centroids of these new triangles are the vertices of an equilateral triangle.

So if you attach squares to the sides of a quadrilateral, are their centroids the vertices of a square? In general no.

But you can attach squares to two sides, and special rectangles to the other two sides, and the centroids will form the corners of a square. Specifically, we have the following theorem by Stephan Berendonk [1].

If you erect squares on the two “nonparallel” sides of a trapezoid and a rectangle on each of the two parallel sides, such that its “height” is equal to the length of the opposite side of the trapezoid, then the centers of the four erected quadrangles will form the vertices of a square.

Here’s an illustration. Berendonk’s theorem asserts that the red quadrilateral is a square.

[1] Stephan Berendonk. A Napoleonic Theorem for Trapezoids. The American Mathematical Monthly, April 2019, Vol. 126, No. 4, pp. 367–369

The post Another Napoleon-like theorem first appeared on John D. Cook.]]>The Playfair cipher was used (and broken) during the first world war. I vaguely remember reading somewhere that the cipher took about an hour to break using pencil and paper. It was secure in the sense that it could be used for messages that only needed to be secure for less time than it took to break the method. It was more secure than simple substitution, and easy to encrypt and decrypt manually.

True to Stigler’s law of eponymy, the Playfair cipher was not named after its inventor, Charles Wheatstone of Wheatstone bridge fame, but after Lyon Playfair who popularized the method. Playfair acknowledged Wheatstone, but his name stuck to the method nevertheless.

The Playfair cipher uses a 5 × 5 grid of letters, so some letter of the Roman alphabet has to go. A common choice was to use the same letter for I and J. (A variation on the method using a 6 × 6 grid of letters and digits would not have to leave out any letters.)

For reasons that will soon be apparent, double letters had to be broken up, say with an X. So “FOOTBALL” would become “FOXOTBALXL.” Amusingly, “MISSISSIPPI” would become “MISXSISXSIPXPI.”

After eliminating Js and splitting double letters, the message is divided into pairs. So FOXOTBALXL becomes FO XO TB AL XL.

The key for the encryption method is the arrangement of the letters in a square. In practice, the key would be some word or phrase that was used to permute the alphabet, and then that permutation was filled into the grid.

Here’s a grid I constructed by asking Python for a random permutation of the alphabet.

Given a pair of letters, the two letters either lie on the same row, the same column, or are in different rows and columns. (This is why you break up double letters.)

If the two letters lie in the same row, advance each letter one position, wrapping around if necessary. For example, IT would be encrypted as FV, and TX would be encrypted as VI.

If two letter line in the same column, proceed analogously, moving each letter down. So TH would be encrypted as GB and OI would be encrypted as IP.

Finally, if the two letters are in different rows and columns, they form the diagonal corners of a rectangle. Replace the two letters with the letters on the remaining corners. For example, IH becomes TR, HE becomes RB, GW becomes DM, etc.

Just as you can attack a simple substitution cipher by looking at letter frequencies, you can attack a Playfair cipher by looking at bigram frequencies. You can find these frequencies for English text on Peter Norvig’s site. TH sticks out in bigram frequencies similarly to how E sticks out in letter frequencies. However, bigram frequencies are more evenly distributed than letter frequencies.

As I pointed out in the previous post, making a mapping between 676 pairs of letters to a randomly generated list of 676 other pairs of letters will not create a secure cipher. But Playfair is much weaker than such a random assignment. There is a lot of structure to the Playfair cipher. This makes it more convenient to use, and easier to break.

Suppose pairs of letters where mapped to random pairs of letters and you learn that GB is the encrypted form of TH. What have you learned about decrypting any other pair? Nothing, except that you’ve eliminated 1 out of 676 possibilities.

But if you learn that a Playfair cipher sends TH to GB, you learn that either (1) T, H. G, and B all lie in the same row or column, or (2) that T and B are in the same column, G and B are in the same column, T and G are in the same row, and H and B are in the same row.

If we rotate the rows or columns in our encryption matrix, nothing changes. This is easy to see in the case when two letters are in the same row or in the same column. It’s a little harder to see but still true when the letters are in different rows and columns.

For example, consider the following encryption matrix, formed by rotating the columns two positions and the rows one position.

If you work through all the examples above, you’ll see that they remain the same. IT still goes to FV etc.

The reason rotating columns or rows doesn’t make a difference is that in matrix notation, the encryption algorithm does not depend on the subscripts per se but the *difference* in subscripts mod 5.

It almost doesn’t matter if you transpose the encryption matrix. If you transpose a matrix, elements that were in the same row are now in the same column and vice versa. When two letters are not in the same row or column, transposing the encryption matrix transposes the encrypted pair. In the example above HE goes to RB. If we transpose the encryption matrix, HE goes to BR.

We said above that the key to a Playfair cipher is a permutation of the alphabet. But many keys correspond to the same encryption mapping. The analyst doesn’t need to recover the original encryption matrix but only some rearrangement of it.

These ciphers are famously easy to break, so easy that they’re common in puzzle books. Here’s one I made [1] for this post in case you’d like to try it.

X RF SXIIXKW XK IYZ UXINYZK HT IYZ CXIICZ YHJSZ RI FZGTXZCG, HJQ SZNHKG TRQF BYXNY XS NJI HTT EV IYZ QXGWZ RKG R MJRQIZQ-FXCZ RNQHSS IYZ TXZCGS TQHF HJQ YHFZ LCRNZ, BYZQZ VHJ RQZ. X RF BQXIXKW R EHHU. XK XI X RF SLZRUXKW IH VHJ. EJI X RF RCSH SLZRUXKW IH IYZ BHQCG. IH EHIY X HBZ RK RNNHJKIXKW.

As is common in puzzle books, I kept the spaces and punctuation.

When you learn that simple substitution is breakable, you might reasonably think that the problem is the small alphabet size. What if you replaced *pairs* of letters with *pairs* of letters, effectively working over an alphabet of size 26² = 676. That’s an improvement, but it’s still not secure. It could be broken manually in a few hours, depending on the length of the text, and of course could be broken quickly using a computer.

If we want a cipher to be secure against computer-aided cryptanalysis, we’re going to need a much bigger alphabet.

The Roman alphabet has 26 letters, which can be expressed in 5 bits. Pairs of Roman letters would require 10 bits. What if we used a 32-bit alphabet, substituting 32-bit sequences with other 32-bit sequences? This is working over an alphabet of over 4 billion symbols. Surely that’s secure? Nope.

What if we use blocks of 128 bits? This is working over an alphabet of size

2^{128} = 340,282,366,920,938,463,463,374,607,431,768,211,456.

Nope. Still not good enough. Because you can see the penguin.

The image above is a famous example of a downfall of simple substitution, albeit over a gargantuan alphabet. The image was created by taking a graphic of the Linux mascot and encrypting the bits using 128-bit encryption. Each block of 128 bits goes to a unique, essentially random replacement. Each block is well encrypted. But there are repetitive blocks in the original that become repetitive blocks in the encrypted version.

The AES (Rijndael) encryption algorithm is a good algorithm, but in the example above it was used poorly. It was used in **electronic code book mode** (ECB), something that nobody would do in practice.

In practice, you might do something like **cipher block chaining** where you XOR each block with the encrypted version of the previous block. You could think of this as a clever way of using a simple substitution over an enormous alphabet. You look up the substitution of each block, but then XOR its bits with the previously encrypted block. Now repetitive input does not produce repetitive output. You cannot see the penguin. The penguin image becomes random-looking static.

[1] I produced the cryptogram using

cat myfile | tr [a-z] [A-Z] | tr [A-Z] ...

where “…” is a permutation of the 26 upper case letters.

The post Simple substitution ciphers over a gargantuan alphabet first appeared on John D. Cook.]]>I’ll use “log” when the base of the logarithm doesn’t matter, and add a subscript when it’s necessary to specify the base. Bidder was only concerned with logarithms base 10.

If you wanted to calculate logarithms, a fairly obvious strategy would be to memorize the logarithms of small prime numbers. Then you could calculate the logarithm of a large integer by adding the logarithms of its factors. And this was indeed Bidder’s approach. And since he was calculating logarithms base 10, so he could make any number into a integer by shifting the decimal and then subtracting off the number of places he moved the decimal after taking the log of the integer.

Numbers with only small prime factors are called “smooth.” The meaning of “small” depends on context, but we’ll say numbers are smooth if all the prime divisors are less than 100. Bidder knew the logs of all numbers less than 100 and the logs of some larger numbers.

But what to do with numbers that have a large prime factor? In this case he used the equation

log(*a* + *b*) = log(*a*(1 + *b*/*a*)) = log(*a*) + log(1 + *b*/*a*).

So if a number *n* doesn’t factor into small primes, but it is close to a number *a* that does factor into small primes, you’re left with the problem of finding log(1 + *b*/*a*) where *b* = *n* – *a* and the fraction *b*/*a* is small.

At this point you may be thinking that now you could use the fact that

log(1 + *x*) ≈ *x*

for small *x*. However, Bidder was interested in logarithms base 10, and the equation above is only valid for natural logarithms, logarithms base *e*. It is true that

log_{e}(1 + *x*) ≈ *x*

but

log_{10}(1 + *x*) ≈ log_{10}(*e*) *x* = 0.43429448 *x*.

Bidder could have used 0.43429448 *x* as his approximation, but instead he apparently [1] used a similar approximation, namely

log(1 + *b*/*a*) ≈ log(1 + 10^{–m}) 10^{m} *b*/*a*

where *b*/*a* is between 10^{–m-1} and 10^{–m}. This approximation is valid for logarithms with any base, though Bidder was interested in logs base 10 and had memorized log_{10}1.01, log_{10}1.001, log_{10}1.0001, etc. [2]

To get eight significant figures, the fraction *b*/*a* must be on the order of 0.001 or less. But not every number *n* whose log Bidder might want to calculate is so close to a smooth number. In that case Bidder might multiply *n* by a constant *k* to find a number so that *kn* is close to a smooth number, take the log of *kn*, then subtract log *k*.

Smith says in [1] “For example, to obtain log 877, he would multiply 877 by 13, which gives 11,401, or 600 × 19 + 1.” Then he could calculate

log(877) = log(13*877) – log(13) = log(11400) + log(1/11400) – log(13)

and use his algorithm for approximating log(1/11400).

I can’t imagine thinking “Hmm. 877 isn’t close enough to a smooth number, but if I multiply it by 13 it will be.” But apparently Bidder thought that way.

Here is a simple way to approximate logarithms to a couple significant figures without having to memorize anything.

See also this post on mentally calculating other functions to similar accuracy.

[1] This equation comes from The Great Mental Calculators by Steven B. Smith. Bidders methods are not entirely known, and Smith prefaces the approximation above by saying “it appears that Bidder’s approximation was …”.

[2]

|-------+-------------| | x | log_10(1+x) | |-------+-------------| | 10^-2 | 0.00432137 | | 10^-3 | 0.00043408 | | 10^-4 | 0.00004342 | | 10^-5 | 0.00000434 | | 10^-6 | 0.00000043 | | 10^-7 | 0.00000004 | |------+--------------|The post How Mr. Bidder calculated logarithms first appeared on John D. Cook.]]>

Here’s a plot of the sines of 0, 1, 2, …, 99.

The points are not literally random, but they’re what Marsaglia would call higgledy-piggledy [1].

Since the points are higgledy-piggledy, it doesn’t seem possible that we could find a lower bound on |sin(*n*)|, but there is one. For all positive integers *n*,

|sin(*n*)| > 2^{–n}.

This result may be due to Christopher Stuart, but in any case he proved [2] a stronger version of this theorem, showing that the 2 on the right hand side can be replaced with any number

α > (sin(3))^{-1/3} = 1.9207….

[1] “If the numbers are not random, they are at least higgledy-piggledy.” — RNG researcher George Marsaglia

[2] Christopher Stuart. An Inequality Involving sin(n). The American Mathematical Monthly , Feb 2018, Vol. 125, No. 2, pp. 173–174

The post Sine of integers first appeared on John D. Cook.]]>