222nd Carnival of Mathematics

A blog carnival is a round up of recent blog posts. The Carnival of Mathematics is a long-running carnival of blog posts on mathematical topics. This is the 222nd edition of the carnival.

Facts about 222

By longstanding tradition, the nth Carnival of Mathematics must begin with trivia about the number n, and so here are five facts about the number 222.

  • There are six groups of order 222, and 222 groups of order 912.
  • 222=9+87+6×5×4+3+2+1.
  • You can encode the number 222 in the Major mnemonic system as “unknown” or “one-on-one.”

The posts

Gil Kalai looks at what happens when you ask ChatGPT to solve Elchanan Mossel’s dice problem.

Γιώργος Πλούσος (@plousos2505 on the social media platform formerly know as Twitter) posted an image showing how to compute π via a sequence of approximations requiring only basic geometry.

David Eppstein’s blog 11011110 has a post on Pyramidology. It’s particularly fitting to have a post from 1101110 in this carnival. As the author explains in his About page,

This journal’s name comes from interpreting my initials as the hexadecimal number 0xDE (decimal 222) and then converting the result to binary.

The blog ThatsMaths has a post on Sharkovsky numbering, Oleksandr Sharkovsky (1936–2022), and Sharkovsky’s Theorem. This post includes the beautiful image from Wikimedia.

The post Patterns of reality by Timothy Williamson gives an overview of logic from basics to the work of Gödel and Turing.

Larissa Fedunik-Hofman posts an interview with Andrew Krause discussing dynamical system.

Finally, we have Matthew Scroggs’ Advent calendar of math puzzles.

To see previous editions of the carnival, or to submit a post to the next edition, see the Carnival of Mathematics page on Aperiodical.

Bounding complex roots by a positive root

Suppose you have an nth degree polynomial with complex coefficients

p(z) = anzn + an-1zn-1 + … + a0

and you want to find some circle that is guaranteed to contain all the zeros of p.

Cauchy found such a circle in 1829. The zeros of p lie inside the circle |z| ≤ r where r is the unique positive root of

f(z) = |an|zn − |an-1|zn-1 − … − |a0|

This value of r is known as the Cauchy radius of the polynomial p.

This may not seem like much of an improvement: you started with wanting to find the roots of an nth degree polynomial and you end with finding the roots of an nth degree polynomial. But Cauchy’s theorem reduces the problem of finding all roots of a complex polynomial to finding one root of a real polynomial. Furthermore, the positive root we’re after is guaranteed to be unique.

If a0 = 0 then p(z) has a factor of z and so we can reduce the problem to bounding the zeros of p(z)/z. Otherwise, f(0) < 0. Eventually f(z) must be positive because the zn term will overtake the rest of the terms for large enough z. So we only need to find some value of z where f(z) > 0 and then we could use the bisection method to find r.

Since our goal is to bound the zeros of p, we don’t need to find r exactly: an upper bound on r will do, though the smaller the upper bound the better. The bisection method gives us a sequence of upper bounds, so we could work in rational arithmetic and have rigorously provable upper bounds.

As for how to find a real value of z where f is positive, we could try z = 2k for successive value of k until we find one that works.

For example, let’s bound the roots of

p(z) = 12z5 + 2z2 + 23i = 0.

Cauchy’s theorem says we need to find the unique positive root of

f(z) = 12z5 − 2z2 − 23.

Now f(0) = −23 and f(2) = 353. So we know right away that the roots of p have absolute value less than 2.

Next we evaluate f(1), which turns out to be −13, and so the Cauchy radius is larger than 1. This doesn’t necessarily mean that p has a root with absolute value greater than 1, only that the Cauchy radius is greater than 1. An upper bound on the Cauchy radius is an upper bound on the absolute values of the roots of p; a lower bound on the Cauchy radius is not necessarily a lower bound on the largest root.

Carrying out two steps of the bisection method by hand was easy, but let’s automate the process of carrying it out further.

>>> from scipy.optimize import bisect
>>> bisect(lambda x: 12*x**5 - 2*x*x - 23, 1, 2)

So Python tells us r = 1.1646451258329762.

Here’s a plot of the roots and the Cauchy radius.

In this example the roots of p are located very near a circle with the Cauchy radius. The roots range in absolute value between 1.1145600699993699 and 1.1634197192917954. The roots nearly lie in a circle because the quadratic term in our polynomial is small and so we are approximately finding the fifth roots of −23i.

Let’s do another example with randomly generated coefficients to get a better idea of how Cauchy’s theorem works in general. The coefficients of our polynomial, from 0th to 5th, are

0.126892 + 0.689356i,  -0.142366 + 0.260969, – 0.918873 + 0.489906i,  0.0599824 – 0.679312i,  – 0.222055 + 0.273651, + 0.154408 + 0.733325i

The roots have absolute value between 0.7844606228243709 and 1.2336256274024142, and the Cauchy radius is 1.5088421845957782. Here’s a plot.

Related posts

Convergent subsequence

I was reading a theorem giving conditions for a divergent series to have a convergent subseries and had a sort of flashback.

I studied nonlinear PDEs in grad school, which amounted to applied functional analysis. We were constantly proving or using theorems about sequences having convergent subsequences, often subsequences that converged in a very weak sense.

This seemed strange to me at first. If a sequence diverges, why is it of any interest that a subsequence converges? This seemed like blackout poetry, completely changing the meaning of a text by selecting various words. For example, here is the opening paragraph of Pride and Prejudice, blacked out to appear to be a real estate ad.

good neighborhood, surrounding park

Here’s the big picture I was missing. We’re trying to show that a differential equation has a solution, and we’re doing that by some kind of successive approximation. Maybe our series of approximations doesn’t work in general, but that doesn’t matter. We’re just trying to find something that is a solution. Once you come up with a candidate solution, by whatever means, grasping at whatever straws you can grasp, you then prove that the candidate really is a solution, perhaps a solution in a weak sense. Then you show that this solution, potentially one of many, is unique. Then you show that your weak solution is a in fact a solution in a stronger sense.

Related posts

How to memorize the periodic table

Periodic table image


Memorizing the periodic table has some practical value, especially if you’re a chemist, but in any case it’s an interesting exercise, easier to do than it may sound. And it’s a case study for how you might memorize other things of more practical value to you personally.

Major system pegs

The Major system is a way to associate consonant sounds to numbers. You can fill in vowels and semivowels as you please to turn the sequence of consonant sounds into words, preferably words that create a vivid image in your mind.

You can pick a canonical encoding of each number to create a set of pegs and use these to memorize numbered lists. Although numbers can be encoded many ways, a set of pegs is a one-to-one mapping to numbers. To pull up the nth item in the list, recall what image you’ve associated with the peg image for n.

For example, you could encode 16 as dish, tissue, touché, Hitachi, etc. If you want to remember that sulfur has atomic number 16 you could use any of those images. But if you wanted to remember that the 16th element is sulfur, you need to have a unique peg associated with 16.

Learning pegs is more work than hanging things on pegs. But once you have a set of pegs, you can reuse them for memorizing multiple lists. For example, you could use the same pegs to memorize the periodic table and the ASCII table.

Atomic numbers

Allan Krill has written up a way to associate each element with a peg. You could use his suggestions, but you’ll almost certainly need to customize some of them. It’s generally hard to use anyone else’s mnemonics. What works for one person may not for another.

To memorize the periodic table, you first come up with pegs for the numbers 1 through 118. Practice those and get comfortable with them. This could take a while, but it’s reusable effort. Then associate an image of each element with its corresponding peg. For example, polonium is element 84. If your peg for 84 is fire, you might imagine someone playing polo on a field that’s on fire.

Element symbols

Every element has a one- or two-letter symbol, and most of these are easy: Ti for titanium, U for uranium, etc. Some seem completely arbitrary, such as Hg for mercury, but these you may already know. These names seem strange because they are mnemonic in Latin. But the elements with Latin names are also the ones that were discovered first and are the most common. You probably know by osmosis, for example, that the symbol for iron is Fe.

The hard part is the second letter, if there is a second letter. For example, is does Ar stand for argon or arsenic? Is the symbol for thulium Th or Tl or Tm?

When you associate an element image with a peg image, you could add a third image for the second letter of the element symbol, using the NATO phonetic alphabet if you know that. For example, the NATO word for S is Sierra. If your peg for 33 is mummy, you might imagine a mummy drinking a bottle of Sierra Springs® water laced with arsenic.

Related posts

Image from OpenStax Biology 2e. CC BY Attribution license.

Solving a triangle the size of Argentina

The numbers in today’s date—11, 28, and 23—make up the sides of a triangle. This doesn’t always happen; the two smaller numbers have to add up to more than the larger number.

We’ll look at triangles with sides 11, 23, and 28 in the plane, on a sphere, and on a hypersphere. Most of the post will be devoted to the middle case, a large triangle on the surface of the earth.

Solving a triangle in the plane

If we draw a triangle with sides 11, 23, and 28, we can find out the angles of the triangle using the law of cosines:

c² = a² + b² – 2ab cos C

where C is the angle opposite the side c. We can find each of the angles of the triangle by rotating which side we call c.

If we let c = 11, then C = arccos((23² + 28² − 11²)/(2 × 23 × 28)) = 22.26°.

If we let c = 23, then C = arccos((11² + 28² − 23²)/(2 × 11 × 28)) = 52.38°.

If we let c = 28, then C = arccos((11² + 23² − 28²)/(2 × 11 × 23)) = 105.36°.

Solving a triangle on a sphere

Now suppose we make our 11-23-28 triangle very large, drawing our triangle on the face of the earth. We pick our unit of measurement to be 100 miles, and we get a triangle very roughly the size and shape of Argentina.

We can still use the law of cosines, but it takes a different form, and the meaning of the terms changes. The law of cosines on a sphere is

cos(c) = cos(a) cos(b) + sin(a) sin(b) cos(C).

As before, a, b, and c are sides of the triangle, and the sides b and c intersect at an angle C. However, now the sides themselves are angles because they are arcs on a sphere. Now a, b, and c are measured in degrees or radians, not in miles.

If the length of an arc is x, the angular measure of the arc is 2πx/R where R is the radius of the sphere. The mean radius of the earth is 3959 miles, and we’ll assume the earth is a sphere with that radius.

We can solve for the angle opposite the longest side by using

C = arccos( (cos(c) – cos(a) cos(b)) / sin(a) sin(b) )


a = 2π × 1100 / 3959
b = 2π × 2300 / 3959
c = 2π × 2800 / 3959

It turns out that C = 149.8160°, and the other angles are 14.3977° and 29.4896°.

Importantly, the sum of these three angles is more than 180°. In fact it’s 193.7033°.

The sum of the vertex angles in a spherical triangle is always more than 180°, and the bigger the triangle, the more the sum exceeds 180°. The amount by which the sum exceeds 180° is called the spherical excess E and it is proportional to the area. In radians,

E = area / R².

In our example the excess is 13.7033° and so the area of our triangle is

13.7033° × (π radians / 180°) × 3959² miles² = 3,749,000 miles².

Now Argentina has an area of about a million square miles, so our triangle is bigger than Argentina, but smaller than South America (6.8 million square miles). Argentina is about 2300 miles from north to south, so one of the sides of our triangle matches Argentina well.

Note that there are no similar triangles on a sphere: if you change the lengths of the sides proportionately, you change the vertex angles.

Solving a triangle on a pseudosphere

In a hyperbolic space, such as the surface of a pseudosphere, a surface that looks sorta like the bell of a trombone, the law of cosines becomes

cosh(c) = cosh(a) cosh(b) + κ sinh(a) sinh(b) cos(C)

where κ < 0 is the curvature of the space. Note that if we set κ = 1 and delete all the hs this would become the law of cosines on a sphere.

Just as the sum of the angles in a triangle add up to more than 180° on a sphere, and exactly 180° in a plane, they add up to less than 180° on a pseudosphere. I suppose you could call the difference between 180° and the sum of the vertex angles the spherical deficiency by analogy with spherical excess, but I don’t recall hearing that term used.

Related posts

Unix linguistics

If you knew that you wanted to learn 10 spoken languages, it would probably be helpful to take a course in linguistics first. Or maybe to have a linguistics course after learning your first or second language. And if the languages are related, it would help to know something about the linguistics of that group of languages in particular. For example, if you wanted to learn several Romance languages, it might be worthwhile to learn at least some Latin first, even if Latin isn’t on the list of languages you want to learn.

In order to become fluent in using the Unix (Linux) command line, you need to learn a dozen related languages. Fortunately none of these languages are anywhere near as large as a spoken language, but there are several of them. Regular expressions, for example, are a pattern description language. You can think of vim as a language. And of course programming languages like sed and awk are languages.

As you use various command line utilities you notice similarities between them. Some tool history is fairly well known. For example, it’s well known that grep takes its name from the g/re/p command in ed, and that grep was created by modifying the ed source code. The history of sed is similar. The line editor ed is a common ancestor of grep, sed, and vi, which explains a lot of the similarity between these tools.

There is a large amount of preserved Unix history, but what I have in mind is more linguistics than history. History often accounts for the similarities in syntax, but I’m primarily interested in the similarities themselves rather than the history. A semi-fictional history might be more useful than an accurate history. “This isn’t exactly how things came about, but you could imagine …”

I’ve seen bits and pieces of what I imagine would comprise a course in Unix linguistics. For example, there is a section in the book sed & awk entitled “Awk, by Sed and Grep, out of Ed.”

I’ve used Emacs since college, but I’m learning how to get by in vi. Part of my motivation is to be able to log into a client’s Linux box and be productive without being able to install or configure anything. Although I hardly know anything about vi at this point, I can tell right away that vi has more syntactic similarity to the rest of the Unix ecosystem than Emacs does.

It would be really nice to have a book with a title like “vi for people who have used sed, grep, and less.” Or even better, a tutor who could relate what I don’t know to what I do know. Someone like my Greek professor.

I took one semester of classical Greek in college. The professor, William Nethercut, was amazing. At the beginning of the semester he asked each student what languages they had studied, and customized the rest of the course accordingly. “This feature of Greek is like that feature in French, Susan. And like this feature of Latin, Mike.” I was impressed by his erudition in languages, but even more impressed with his thoughtfulness in relating to each of us individually. If Dr. Nethercut taught a class in the Unix ecosystem, he could say “So, you know this set of tools and want to learn that set of tools. You’ll find that the syntax of this is similar to the syntax of that, but watch out for this difference.”

Numerical integral with a singularity

Richard Hamming [1] gives this nice example of an integral with a mild singularity:

\int_0^1 \log(\sin(x))\, dx

The integrand approaches −∞ as x approaches 0 and yet the integral is finite. If we try into numerically evaluate this integral, we will either get inaccurate results or we will have to go to a lot of work.

This post will show that with a clever reformulation of the problem we use simple methods to get accurate results, or use sophisticated methods with fewer function evaluations.

As I wrote about years ago, a common technique for dealing with singularities is to add and subtract a function with the same asymptotic behavior that can be integrated by hand. Hamming does a slight variation on this, multiplying and dividing by x inside the logarithm.

\begin{align*} \int_0^1 \log(\sin(x))\, dx &= \int_0^1 \log(x)\, dx + \int_0^1 \log\left(\frac{\sin(x)}{x}\right) \, dx \\ &=  -1 + \int_0^1 \log\left(\frac{\sin(x)}{x}\right) \, dx \end{align*}

Here we integrated the singular part, log(x), and we are left with numerically integrating a well-behaved function, one that is smooth and bounded on the interval of integration. Because sin(x) ≈ x for small x, we can define sin(x)/x to be 1 at 0 and have a smooth function.

We’ll use NumPy’s sinc function to handle sin(x)/x properly near 0. There are two conventions for defining the sinc function, either as sin(x)/x or as sin(πx)/πx. NumPy uses the latter convention, so we define our own sinc function as follows.

import numpy as np
def mysinc(x): return np.sinc(x/np.pi)

Trapezoid rule

Let’s use the simplest numerical method, the trapezoid rule.

We run into a problem immediately: if we chop up the interval [0, 1] in a way that puts an integration point at 0, our resulting integral will be infinite. We could replace 0 with some ε > 0, but if we do, we should try a few different values of ε to see whether our choice of ε greatly impacts our integral.

for ε in [1e-6, 1e-7, 1e-8]:
    xs = np.linspace(ε, 1, 100)
    integral = sum( np.log(np.sin(x)) for x in xs ) / 100

This gives us


suggesting that the integral does indeed depend on ε. We’ll see soon that our integral evaluates to around −1.05. So our results are not accurate, and the smaller ε is, the worse our answer is. [2]

Now let’s evaluate the integral as Hamming suggests. We’ll use a varying number of integration points so the difference between our integral estimates will give us some idea whether we’re converging.

for N in [25, 50, 100]:
    xs = np.linspace(0, 1, N)
    integral = sum( np.log(mysinc(xs)) )/N
    integral -= 1 # subtract the integral of log(x)

This gives us


The consistency between the result suggests the integral is around −1.057.

Adaptive integration

We said at the top of this article that Hamming’s reformulation would either let us get accurate results from simple methods or let us get by with fewer function evaluations using a sophisticated method. Now we’ll demonstrate the latter, using the adaptive integration algorithm quad from SciPy.

from scipy.integrate import quad

# Integrate log(sin(x)) from 0 to 1
integral, error = quad(lambda x: np.log(np.sin(x)), 0, 1)
print(integral, error) 

# Integrate log(x sin(x)/x) = log(x) + log(mysinc(x)) from 0 to 1
integral, error = quad(lambda x: np.log(mysinc(x)), 0, 1)
integral -= 1
print(integral, error)

This prints

-1.0567202059915843 1.7763568394002505e-15
-1.056720205991585 6.297207865333937e-16

suggesting that both approaches are working. Both estimate their error to be near machine precision. But as we’ll see, the direct approach uses about 10 times as many function evaluations.

Let’s ask quad to return more details by adding full_output = 1 as an argument.

integral, error, info = quad(lambda x: np.log(np.sin(x)), 0, 1, full_output=1)
print(integral, error, info["neval"])

integral, error, info = quad(lambda x: np.log(mysinc(x)), 0, 1, full_output=1)
integral -= 1
print(integral, error, info["neval"])

This shows that the first integration used 231 function evaluations, but the second used only 21.

The difference in efficiency doesn’t matter when you’re evaluating an integral one time, but if you were repeatedly evaluating similar integrals inside a loop, subtracting off the singularity could make your problem run 10 times faster. Simulations involving Bayesian statistics can have such integrations in the inner loop, and so making an integration an order of magnitude faster could make the overall simulation an order of magnitude faster, reducing CPU-days to CPU-hours.

Related posts

[1] Richard Hamming, Numerical Methods for Scientists and Engineers. Second edition. Dover.

[2] We can get better results if we let ε and 1/N go to zero at the same rate. The following code produces mediocre results, but better results than above.

for N in [10, 100, 1000]:
    xs = np.linspace(1/N, 1, N)
    integral = sum( np.log(np.sin(x)) for x in xs ) / N

This prints


which at least seems to be getting closer to the correct result, but has bad accuracy for so many function evaluations.

Radius of a stretched spring

When you stretch a coiled spring, the radius decreases slightly, so slightly that you can’t see the difference unless you stretch the spring so much that you damage it.

The math is essentially the same as in the previous post about wrapping Christmas lights around a tree trunk.

If you have a coiled spring of radius r, the points along the coil can be described by

(r cos t, r sin t, ht/2π)

where h is the spacing between turns. If t runs from 0 to T, the length of the spring is hT/2π and the length of the material in the spring, if it were uncoiled, would be

(r² + h²/4π²)1/2 T.

When we stretch a spring, we increase h. We don’t increase the total amount of material, so the radius must decrease, though not by much.

Suppose the spring initially has radius r1 and coil spacing h1. Then when we stretch it the spring has radius r2 and coil spacing h2. Since we haven’t created new material, we must have

(r1² + h1²/4π²)1/2 T = (r2² + h2²/4π²)1/2 T

and so

r1² + h1²/4π² = r2² + h2²/4π².

A small change in h results in a change in r an order of magnitude smaller, for reasons given in the previous post. Both posts boil down to the observation that for y small relative to x,

(x² + y²)1/2  x  = y² /2x + O(y4).

If we choose our units so that the initial radius is on the order of 1, then a change in length on the order of y results in a change in radius on the order of y².

Wrapping Christmas lights around a tree trunk

Suppose you want to wrap Christmas lights around a tree trunk that we can approximate by a cylinder of radius r.

You want to wrap lights around the tree in a helix, going up a distance h every time you go around the tree once. What length of lights do you need to make n turns around the tree?

You can model the lights as a parametric curve with equation

(r cos t, r sin t, ht/2π)

If t ranges from 0 to T, the corresponding curve length is

(r² + h²/4π²)1/2 T

and you can set T =2πn if you want to find the length of n turns.

How much does h matter? Not much if h is less than r, as is often the case when wrapping a tree trunk with Christmas lights. My daughter Allison discovered this while wrapping lights around our pine tree, and then I wrote this post adding the math details.

If we expand (r² + h²/4π²)1/2 as a function of h in a Taylor series centered at 0 we get

(r² + h²/4π²)1/2  = r + / 8π²r + O(h4).

For example, suppose a tree is r = 10 inches in diameter and move h = 4 inches vertically with each turn. To complete one turn we let T = 2π. The exact length of one turn is

(r² + h²/4π²)1/2  T = (10² + 4²/4π²)1/2 (2π) = 62.96 inches.

If we ignore the h term we get

rT = 10 (2π) = 62.83 inches.

In short, the length of n turns around the tree is 2πrn, the same as 10 circles around the tree. The difference in length between a helix with n turns and n circles is negligible, provided h is smaller than r. Even if h = r = 10 in the example above, we’d get a length of 63.6 inches and our approximation would still be off by less than an inch per turn.

The heart of this calculation pops up frequently in various contexts. For example, the same calculation appears in the post It doesn’t matter much if the tape is straight.

Database reconstruction attacks

In 2018, three researchers from the US Census Bureau published a paper entitled “Understanding Database Reconstruction Attacks on Public Data.” [1] The article showed that private data on many individuals could be reverse engineered from public data.

As I wrote about a few days ago, census blocks are at the bottom of the US Census Bureau’s hierarchy of geographical entities. On average a census block may contain about 40 people, but a block may contain only one person.

In hindsight it seems fairly obvious that data reported at the census block level is vulnerable to re-identification, and yet this doesn’t seem to have been noticed before around 2000. There were some privacy measures in place before then, but it wasn’t clear that these methods were insufficient to protect privacy.

You can think of each fact about each person as a variable and each reported statistic as an equation. When the number of equations is comparable to the number of variables, it’s possible that the system of equations has a unique solution. (We know a priori that there exists at least one solution, assuming the reported statistics were correctly computed.)

It’s not quite as simple as that, though that is roughly the idea in [1]. The data collected in the census is binary or integer data, which makes database reconstruction easier. Ages, for example, are integers, and typically integers less than 100.

One of the techniques the Census Bureau previously used in an attempt to protect individual privacy was a sort of small cell rule, a rule to not report statistics based on three or fewer individuals. This may or may not help. In the example given in [1], there are 7 people in a hypothetical census block, of whom 4 are adults and an unreported number are minors. Determining the number of minors is left as an exercise for the reader.

The set of equations is more complicated than a set of linear equations. The inference problem is a matter of logic programming or constraint satisfaction. Missing data is not always as trivial to reconstruct as in the preceding paragraph, but missing data can still convey partial information. The very fact that the data is missing tells you something.

The discrete nature of the data makes the solution process both harder and easier. It makes things harder in the sense of requiring a more complicated solution algorithm, but it makes things easier in the sense of increasing the likelihood that the equations have a unique solution.

This is why the Census Bureau embraced differential privacy for the 2020 census. They had no choice but to do something substantially different than they had done in the past once it became apparent that their previous approach failed rather badly at protecting confidentiality.

Related posts

[1] Simson Garfinkel, John M. Abowd, Christain Martindale. Understanding Database Reconstruction Attacks on Public Data. ACM Quque, October 2018. The article was also published in Communications of the ACM in March 2019.