# Poisson distribution tail bounds

Yesterday Terence Tao published a blog post on bounds for the Poisson probability distribution. Specifically, he wrote about Bennett’s inequalities and a refinement that he developed or at least made explicit. Tao writes

This observation is not difficult and is implicitly in the literature … I was not able to find a clean version of this statement in the literature, so I am placing it here on my blog.

Bennett’s inequalities say that for a Poisson random variable X with mean λ,

for u ≥ 0 and

for -1 < u ≤ 0 where

I wanted to visualize how tight Bennett’s bounds are and got some interesting images due to the discrete nature of the Poisson distribution.

Here’s a plot of how tight the right-tail estimate is, the gap between the two sides of the first inequality above.

Here u ranges from 0 to 2 along the front edge of the graphic and λ ranges from 5 to 10.

And here’s a plot of how tight the left-tail estimate is, the gap between the two sides of the second inequality above.

## Mathematica tweaking

The latter image was easy to make, but the first image required a couple adjustments: the image had holes in it, and the view point was awkward.

I fixed the holes in the plot by adding the option PlotPoints->50 to make the sampling finer. I fixed the view point by grabbing the image and rotating it until I thought it looked better. I could have saved the rotated image directly, but I was curious how to do this with the Export command. To do this I needed to specify the ViewPoint explicitly in the plotting command, but I didn’t know how to get the value of ViewPoint that’s I’d implicitly found via my mouse. A comment on the Mathematica Stack Exchange site told me what I needed to know.

Simply edit the output cell and wrap Options[…, ViewPoint] around the already rotated output. The graphics should be in the place of …. ViewVertical may also change during rotating, as well as some other parameters.

# Jacobi functions with complex parameter

Jacobi functions are complex-valued functions of a complex variable z and a parameter m. Often this parameter is real, and 0 ≤ m < 1. Mathematical software libraries, like Python’s SciPy, often have this restriction. However, m could be any complex number.

The previous couple of posts spoke of the fundamental rectangle for Jacobi functions. But in general, Jacobi functions (and all other elliptic functions) have a fundamental parallelogram.

When m is real, the two periods of the Jacobi sn function are 4K(m) and 2K(1-m) i and so the function repeats horizontally and vertically. (Here K is the complete elliptic integral of the first kind.) When m is complex, the two periods are again 4K(m) and 2K(1-m) i, but now 4K(m) and 2K(1-m) are complex numbers and their ratio is not necessarily real. This means sn repeats over a parallelogram which is not necessarily a rectangle.

The rest of the post will illustrate this with plots.

First, here is a plot of sn(K(1/2)z, 1/2). The height of the graph represents the absolute value of the function and the color represents its phase. (The argument z was multiplied by K(1/2) to make the periods have integer values, making it a little easier to see the periods.)

Notice that the plot features line up with the coordinate axes, the real axis running from 0 to 8 in the image and the complex axis running from 0 to 4.

Here’s the analogous plot for sn(z, 2 + 2i).

Now the features are running on a diagonal. The pits are where the function is zero and the white ellipses are poles that have have the tops cut off to fit in a finite plot.

It will be easier to see what’s going on if we switch to flat plots. The color represents phase as before, but now magnitude is denoted by contour lines.

Here’s a plot of sn(K(1/2)z, 1/2)

and here’s a plot of sn(z, 2 + 2i).

According to theory the two periods should be

4 K(2 + 2i) = 4.59117 + 2.89266 i

and

2i K(-1 – 2i) = -1.44633 + 2.29559 i.

We can show that this is the case by plotting the real and imaginary parts of sn(z, 2 + 2i) as we move in these two directions.

The Mathematica code

    Plot[
{Re[JacobiSN[t EllipticK[2 + 2 I], (2 + 2 I)]],
Im[JacobiSN[t EllipticK[2 + 2 I], (2 + 2 I)]]},
{t, 0, 4}]

produces this plot

and the code

    Plot[
{Re[JacobiSN[t I EllipticK[-1 - 2 I], (2 + 2 I)]],
Im[JacobiSN[t I EllipticK[-1 - 2 I], (2 + 2 I)]]},
{t, 0, 4}]

produces this plot.

# When a cubic or quartic has a double root

Thanks to the quadratic equation, it’s easy to tell whether a quadratic equation has a double root. The equation

has a double root if and only if the discriminant

is zero.

The discriminant of a cubic is much less known, and the analogs for higher order polynomials are unheard of. There is a discriminant for polynomials of all degrees, though the complexity of the discriminant grows quickly with the degree of the polynomial.

This post will derive the discriminant of a cubic and a quartic.

## Resultants

The resultant of a pair of polynomials is zero if and only if the two polynomials have a common root. Resultants have come up in a couple previous posts about solving trigonometric equations.

A polynomial p(x) has a double root if p and its derivative p‘ are both zero somewhere. The discriminant of p is the resultant of p and p’.

The resultant of two polynomials is a determinant of their Sylvester matrix. This matrix is easier to describe by example than by equation. You basically fill a matrix with shifts of the coefficients of both polynomials and fill in the gaps with zeros.

MathWorld gives the following Mathematica code for the Sylvester matrix of two inputs.

    SylvesterMatrix1[poly1_, poly2_,  var_] :=
Function[{coeffs1, coeffs2}, With[
{l1 = Length[coeffs1], l2 = Length[coeffs2]},
Join[
l1 + l2 -  2], l2 - 2],
l1 + l2 - 2], l1 - 2]
]
]
][
Reverse[CoefficientList[poly1, var]],
Reverse[CoefficientList[poly2, var]]
]


## Cubic discriminant

If we apply this to the cubic polynomial

we get the following matrix.

We can compute the resultant by taking the determinant of the above matrix.

    g[x_] := a x^3 + b x^2 + c x + d
SylvesterMatrix1[g[x], D[g[x], x], x]


We get the following result

and we can verify that this is the same result we would get from calling the Resultant directly with

    Resultant[g[x], D[g[x], x], x]

Although the resultant is defined in terms of a determinant, that doesn’t mean that resultants are necessarily computed by computing determinants. The Sylvester matrix is a very special matrix, and there are clever ways to exploit its structure to create more efficient algorithms.

Each term in the resultant has a factor of a, and the discriminant is the resultant divided by −a.

## Quartic discriminant

Now let’s repeat our exercise for the quartic. The Sylvester matrix for the quartic polynomial

and its derivative is

I created the image above with the following Mathematica code.

    f[x_] := a x^4 + b x^3 + c x^2 + d x + e
TeXForm[SylvesterMatrix1[f[x], D[f[x], x], x]]


If we take the determinant, we get the resultant, but it’s a mess.

Again each term has a factor of a, so we can divide by a. to get the discriminant.

If we want to use this in code, we can have Mathematica export the expression in C code using CForm. To generate Python code, it’s more convenient to use FortranForm since Python like Fortran uses ** for exponents.

The following Python code was created by pasting the output of

    FortranForm[Resultant[f[x], D[f[x], x], x]]

and making it into a function.

    def quartic_resultant(a, b, c, d, e):
return a*b**2*c**2*d**2 - 4*a**2*c**3*d**2 - 4*a*b**3*d**3 + 18*a**2*b*c*d**3 \
- 27*a**3*d**4 - 4*a*b**2*c**3*e + 16*a**2*c**4*e + 18*a*b**3*c*d*e \
- 80*a**2*b*c**2*d*e - 6*a**2*b**2*d**2*e + 144*a**3*c*d**2*e \
- 27*a*b**4*e**2 + 144*a**2*b**2*c*e**2 - 128*a**3*c**2*e**2 \
- 192*a**3*b*d*e**2 + 256*a**4*e**3


Let’s try this on a couple examples. First

which has a double root at 0.

As expected

    quartic_resultant(1, -5, 6, 0, 0)

returns 0.

Next let’s try

The call

    quartic_resultant(1, -10, 35, -50, 24)

returns 144. We expected a non-zero result since our polynomial has distinct roots at 1, 2, 3, and 4.

## Higher order discriminants

In general the discriminant of an nth degree polynomial is the resultant of the polynomial and its derivative, up to a constant. There’s no need to worry about this constant if you’re only concern is whether the discriminant is zero. To get the exact discriminant, you divide the resultant by the leading coefficient of the polynomial and adjust the sign.

The sign convention is a little strange. If you look back at the examples above, we divided by −a in the cubic case but we divided by a in the quartic case. You might reasonably guess that you should divide by a and multiply by (−1)n. But that won’t give you right sign for the quadratic case. The conventional sign is

(−1)n(n − 1)/2.

So when n equals 2 or 3 we get a negative sign, but when n equals 4 we don’t.

# Three interesting curves

Here are three curves that have interesting names and interesting shapes.

## The fish curve

The fish curve has parameterization

x(t) = cos(t) – sin²(t)/√2
y(t) = cos(t) sin(t)

We can plot this curve in Mathematica with

    ParametricPlot[
{Cos[t] - Sin[t]^2/Sqrt[2], Cos[t] Sin[t]},
{t, 0, 2 Pi}]


to get the following.

It’s interesting that the image has two sharp cusps: the parameterization consists of two smooth functions, but the resulting image is not smooth. That’s because the velocity of the curve is zero at 3π/4 and 7π/4.

I found the fish curve by poking around Mathematica’s PlainCurveData function. I found the next two in the book Mathematical Models by H. Martyn Cundy and A. P. Rollett.

## The electric motor curve

Next is the “electric motor” curve. Cundy and Rollett’s book doesn’t provide illustrations for the plane curves it lists on pages 71 and 72, and I plotted the electric motor curve because I was curious why it was called that.

The curve has equation

y²(y² − 96) = x² (x² − 100)

and we can plot it in Mathematica with

    ContourPlot[ y^2 (y^2 - 96) == x^2 (x^2 - 100),
{x, -20, 20}, {y, -20, 20}

which gives the following.

Sure enough, it looks like the inside of an electric motor!

## The ampersand curve

I was also curious about the “ampersand curve” because of its name. The curve has equation

(y² − x²) (x − 1) (2x − 3) = 4 (x² + y² − 2x

and can be plotted in Mathematica with

    ContourPlot[(y^2 - x^2) (x - 1) (2 x - 3) ==
4 (x^2 + y^2 - 2 x)^2,
{x, -0.5, 2}, {y, -1.5, 1.5},
PlotPoints -> 100]


The produces the following pleasant image.

Note that the code specifies PlotPoints. Without this argument, the default plotting resolution produces a misleading plot, making it appear that the curve has three components.

# Average digit sum

Suppose you write down a number and take the sum of its digits. In what base will this sum be the smallest on average?

Let’s do a couple examples comparing base 10 and base 2. The number 2022 in base 10 has digit sum 6, but its binary equivalent 11111100110 has digit sum 8, so the base 10 representation has the smaller digit sum. But if we take 1024 in base 10, the digit sum is 7, but 1024 = 210 and so the sum of its binary digits is just 1.

In [1] the author proves that for sufficiently large N, the average digit sum for all positive integers less than N is the least in base 2. Moreover, the author shows that as N goes to infinity, the average digit sum of numbers less than N written in radix r is asymptotically equal to

(r – 1) log N / 2 log r.

This is an increasing function of r and so not only does base 2 have the minimum average digit sum, the average digit sum increases with the size of the base.

Let’s see how this works out with N = 1,000,000 and r ranging from 2 to 10 using the following Mathematica code.

    Table[
Sum[ Total[IntegerDigits[n, r]],
{n, 0, 1000000}] /1000000.,
{r, 2, 10}
]


This returns

    {9.885, 12.336, 14.8271, 16.5625, 18.5806,
20.5883, 22.2383, 24.1978, 27.}

Now let’s see what values the asymptotic formula predicted.

    Table[(r - 1) Log[1000000.]/(2 Log[r]), {r, 2, 10}]

This returns

    {9.96578, 12.5754, 14.9487, 17.1681, 19.2765,
21.2993, 23.2535, 25.1508, 27.}


So for each r the asymptotic formula produces a good approximation to the actual result.

[1] L. E. Bush. Asymptotic Formula for the Average Sum of the Digits of Integers. The American Mathematical Monthly, Mar., 1940, Vol. 47, No. 3, pp. 154-156

# Quirks in Mathematica’s administrative division data for Mexico

If you ask Mathematica for a list of Mexican states via

    CountryData["Mexico", "RegionNames"]

you will get a list of strings:

    "Aguascalientes", "Baja California", ..., "Zacatecas"}

However, when you try to turn this into a list of objects representing these states via

    states = Entity["AdministrativeDivision", {#, "Mexico"}] & /@
CountryData["Mexico", "RegionNames"]


something strange happens. Some items in the list are turned into useful objects, and some are uninterpreted symbols.

For example, Aguascalientes is recognized as an administrative division, but Baja California is not. It recognizes Oaxaca but not Nuevo Leon. The pattern is that states with a space are not recognized. There is an inconsistency in Mathematica: output names do not always match input names. To create the object representing Baja California, you need to pass in the string BajaCalifornia with no space.

    Entity["AdministrativeDivision", {"BajaCalifornia", "Mexico"}]

OK, so let’s remove spaces before we try to create a list of geographic objects.

    names = StringReplace[#, " " -> ""] & /@
CountryData["Mexico", "RegionNames"]

This mostly works, but it trips up on Mexico City. The output name for the region is Ciudad de México, but Mathematica does not recognize CiudaddeMéxico as an administrative division. Mathematica does recognize MexicoCity as the name of a city but not as the name of an administrative division.

Changing CiudaddeMéxico to MexicoCity in the list of names did not fix the problem. But when I directly edited the list of state objects by replacing the uninterpreted value with the output running

    Entity["AdministrativeDivision", {"MexicoCity", "Mexico"}]

by itself everything worked. Then I was able to find a Traveling Salesman tour as in earlier posts (Africa, Americas, Eurasia and Oceania, Canada).

## Traveling Salesman tour of Mexico

The tour is

1. Baja California
2. Baja California Sur
3. Sinaloa
4. Durango
5. Zacatecas
6. Aguascalientes
7. Nayarit
8. Jalisco
9. Colima
10. Michoacán
11. México
12. Mexico City
13. Morelos
14. Guerrero
15. Oaxaca
16. Chiapas
17. Tabasco
18. Campeche
19. Quintana Roo
20. Yucatán
21. Veracruz
22. Puebla
23. Tlaxcala
24. Hidalgo
25. Querétaro
26. Guanajuato
27. San Luis Potosí
28. Tamaulipas
29. Nuevo León
30. Coahuila
31. Chihuahua
32. Sonora

The tour is 8,343 kilometers.

# Costas arrays in Mathematica

A couple days ago I wrote about Costas arrays. In a nutshell, a Costas array of size n is a solution to the n rooks problem, with the added constraint that if you added wires between the rooks, no two wires would have the same length and slope. See the earlier post for more details.

The earlier post implemented the Lempel algorithm in Python. Here we’ll implement it in Mathematica. Lempel’s algorithm says to start with parameters p and a where p is a prime and a is a primitive root mod p [1]. Then you can create a Costas array of size n = p − 2 by filling in the (i, j) square if and only if

ai + aj = 1 mod p.

We can implement this in Mathematica by

    fill[i_, j_, p_, a_] := If[Mod[a^i + a^j, p] == 1, 1, 0]
m[p_, a_] := Table[fill[i, j, p, a], {i, 1, p - 2}, {j, 1, p - 2}]


We could visualize the Costas array generated by p = 11 and a = 2 using ArrayPlot. The code

    ArrayPlot[m[11, 2], Mesh -> All]

Generates the following image.

Note that the image is symmetric with respect to the main diagonal. That’s because our algorithm is symmetric in i and j. But Costas arrays are not always symmetrical, so this underscores the point that there is no known scalable algorithm for finding all Costas arrays.

In this example we started with knowing that 2 was a primitive root mod 11. We could have had Mathematica pick a primitive root mod p for us by calling PrimitiveRoot[p]

Just out of curiosity, let’s redo the example above, but instead of testing whether

ai + aj = 1 mod p.

we’ll just use ai + aj mod p.

The code

    ArrayPlot[Table[Mod[2^i + 2^j, 11], {i, 1, 9}, {j, 1, 9}]]

produces the following image.

[1] Lempel’s algorithm generalizes to finite fields of any order, not just integers modulo a prime.

# Finding similar world flags with Mathematica

A week ago I posted some pairs of similar flags on Twitter, and later I found that Mathematica’s CountryData database contains flag descriptions. So I thought I’d use the flag descriptions to see which flags Mathematica things are similar.

For example, the FlagDescription attribute for Chad in Mathematica is

Three equal vertical bands of blue (hoist side), yellow, and red; similar to the flag of Romania; also similar to the flags of Andorra and Moldova, both of which have a national coat of arms centered in the yellow band; design was based on the flag of France.

I had Mathematica output a list of countries and flag descriptions, then searched the output for the word “similar.” I then made the following groupings based on the output [1].

## Emoji

Each flag has an emoji, so here are the groupings above using emoji icons

• 🇹🇩 🇷🇴
• 🇧🇴 🇬🇭
• 🇨🇴 🇪🇨
• 🇮🇳 🇳🇪
• 🇮🇪 🇨🇮
• 🇸🇻 🇳🇮 🇭🇳
• 🇪🇬 🇮🇶 🇸🇾 🇾🇪
• 🇱🇺 🇳🇱
• 🇦🇩 🇲🇩
• 🇮🇩 🇲🇨

## Related posts

[1] The groupings are based on Mathematica’s output, but I did some editing. Strictly following Mathematica’s descriptions would have been complicated. For example, Mathematica’s description might say A is similar to B, but not say B is similar to A. Or it might cluster four flags together that could better be split into two pairs.

Suppose you want to color a map with no two bordering regions having the same color. If this is a map on a plane, you can do this using only four colors, but maybe you’d like to use more.

You can reduce the problem to coloring the nodes in a graph. Each node corresponds to a region, and there is an edge between two nodes if and only if their corresponding regions share a border.

Here is a sort of topologists’s or graph theorist’s view of the continental United States.

This was created using the following sample code from the Mathematica documentation.

    RelationGraph[MemberQ[#2["BorderingStates"], #1] &,
EntityList[


You can recognize Maine in the graph because it’s the only state that only borders one other state. Alaska is also easy to locate. Exercise for the reader: mentally add Hawaii to the graph.

The analogous graph for Texas counties took much longer to draw: there are 49 continental US states but 254 Texas counties.

This was created with the following code.

    RelationGraph[MemberQ[#2["BorderingCounties"], #1] &,


You can find El Paso county in the top left; it only borders one county just as Maine only borders one state.

# A traveling salesman tour of Africa

Suppose you’d like to tour Africa, visiting each country once, then returning to your starting point, minimizing the distance traveled.

Here’s my first attempt at a solution using Mathematica, based on an example in the documentation for FindShortestTour.

    africa = CountryData["Africa"]
FindShortestTour[africa]
GeoGraphics[{Thick, Red, GeoPath[africa[[%[[2]]]]]}]


This produced the following map:

Hmm. Maybe I should have been more specific about what I mean by “Africa.” My intention was to find a tour of continental Africa, i.e. not including islands. This means I needed to remove several items from Mathematica’s list of African countries. Also, I had in mind sovereign states, not territories of overseas states and not disputed territories.

After doing this, the map is more like what I’d expect.

The tour is then

1. Algeria
2. Tunisia
3. Libya
4. Egypt
6. Central African Republic
7. Democratic Republic of the Congo
8. Burundi
9. Rwanda
10. Uganda
11. South Sudan
12. Sudan
13. Eritrea
14. Djibouti
15. Somalia
16. Ethiopia
17. Kenya
18. Tanzania
19. Malawi
20. Zambia
21. Mozambique
22. Zimbabwe
23. Eswatini
24. Lesotho
25. South Africa
26. Botswana
27. Namibia
28. Angola
29. Republic of the Congo
30. Gabon
31. Equatorial Guinea
32. Cameroon
33. Nigeria
34. Niger
35. Mali
36. Burkina Faso
37. Benin
38. Togo
39. Ghana
40. Ivory Coast
41. Liberia
42. Sierra Leone
43. Guinea
44. Guinea-Bissau
45. Gambia
46. Senegal
47. Mauritania
48. Morocco

The initial tour, including islands, foreign territories, and Western Sahara, was 23,744 miles or 38,213 kilometers. The second tour was 21,074 miles or 33915 kilometers.

Here’s a tour of just the islands, excluding foreign territories.

The order of the tour is

1. Cape Verde
2. Seychelles
3. Mauritius