Given an ellipse centered at the origin with semi-major axis *a* and semi-minor axis *b*. Will will assume without loss of generality that *a*² – *b*² = 1 and so the foci are at ±1.

Hermann Schwarz published the conformal map from the ellipse to the unit disk in 1869 [1, 2].

The map is given by

where sn is the Jacobi elliptic function with parameter *k*². The constants *k* and *K* are given by

where θ_{2} and θ_{3} are theta constants, the value so the theta functions θ_{2}(*z*, *q*) and θ_{3}(*z*, *q*) at *z* = 1.

Conformal maps to the unit disk are unique up to rotation. The map above is the unique conformal map preserving orientation:

The inverse of this map is given by

The inverse of the sn function with parameter *m* can be written in terms of elliptic integrals.

where *F* is the incomplete elliptic integral of the first kind and *m* is the parameter of sn and the parameter of *F*.

I wanted to illustrate the conformal map using an ellipse with aspect ratio 1/2. To satisfy *a*² – *b*² = 1, I set *a* = 2/√3 and *b* = 1/√3. The plot at the top of the post was made using Mathematica.

- NASA and conformal maps
- Comparing Jacobi functions and trig functions
- Conformal mapping and Laplace’s equation
- Numerically evaluate a theta function

[1] H. A. Schwarz, Über eigige Abbildungsaufgaben, Journal für di reine und angew. Matheamatik, vol 70 (1869), pp 105–120

[2] Gabor Szegö. Conformal Mapping of the Interior of an Ellipse onto a Circle. The American Mathematical Monthly, 1950, Vol. 57, No. 7, pp. 474–478

The post Conformal map of ellipse interior to a disk first appeared on John D. Cook.]]>This observation has been called the **piranha problem**. Predictors are compared to piranha fish. If you have a lot of big piranhas in a small pond, they start eating each other. If you have a lot of strong predictors, they predict each other.

In [1] the authors quantify the piranha effect several ways. I’ll just quote the first one here. See the paper for several other theorems and commentary on their implications.

If *X*_{1}, …, *X*_{p}, *y* are real-valued random variables with finite non-zero variance, then

So if the left side is large, either because *p* is large or because some of the correlations are large, then the right side is also large, and so the sum of the interaction terms is large.

[1]. The piranha problem: large effects swimming in a small pond. Available on arxiv.

The post Big correlations and big interactions first appeared on John D. Cook.]]>The **incircle** of a triangle is the largest circle that can fit inside the triangle. When we add the incircle to the illustration from the post on the nine-point circle, it’s kinda hard to see the difference between the two circles. The nine-point circle is drawn in solid black and the incircle is drawn in dashed green.

If we extend the sides of the triangle, an **excircle** is a circle tangent to one side the original triangle and to the extensions of the other two sides.

The post Incircle and excircles first appeared on John D. Cook.]]>

I tested this empirically and found the following stats. The numbers basically confirm what the host said.

The “double” column counts double letters at the end of a word, and the “all” column counts all words ending in the given letter, single or double.

|---+--------+-------+----| | | double | all | % | |---+--------+-------+----| | b | 8 | 476 | 2 | | c | 0 | 11324 | 0 | | d | 18 | 15996 | 0 | | f | 218 | 919 | 24 | | g | 11 | 6432 | 0 | | h | 1 | 4754 | 0 | | j | 0 | 17 | 0 | | k | 1 | 2650 | 0 | | l | 740 | 14929 | 5 | | m | 2 | 8881 | 0 | | n | 31 | 19966 | 0 | | p | 10 | 2201 | 0 | | q | 0 | 6 | 0 | | r | 35 | 15467 | 0 | | s | 9559 | 26062 | 37 | | t | 51 | 14831 | 0 | | v | 0 | 37 | 0 | | w | 0 | 702 | 0 | | x | 0 | 793 | 0 | | y | 0 | 27747 | 0 | | z | 22 | 143 | 15 | |---+--------+-------+----|

These stats simply count words; I suspect the results would be different if the words were weighted by frequency. For example, there are eight words that end in *bb*, but seven of these are rare words or alternate spellings: abb, bibb, dabb, dhabb, dubb, ebb, hubb, stubb.

Artemis is in a highly eccentric orbit around the moon, coming within 130 km (80 miles) of the moon’s surface at closest pass, and this orbit will take 14 days to complete. The weak link in this data is “14 days.” Surely this number has been rounded for public consumption.

If we assume Artemis is in a Keplerian orbit, i.e. we can ignore the effect of the Earth, then we can calculate the shape of the orbit using the information above. This assumption is questionable because as I understand it the reason for such an eccentric orbit has something to do with Lagrange points, which means the Earth’s gravity matters. Still, I image the effect of Earth’s gravity is a smaller source of error than the lack of accuracy in knowng the period.

Artemis is orbiting the moon similarly to how the Mars Orbiter Mission orbited Mars. We can use Kepler’s equation for period *T* to solve for the semi-major axis *a* of the orbit.

*T* = 2π √(*a*³/μ)

Here μ = *GM*, with *G* being the gravitational constant and *M* being the mass of the moon. Now

*G* = 6.674 × 10^{-11} N m²/kg²

and

*M* = 7.3459 × 10^{22} kg.

If we assume *T* is 14 × 24 × 3600 seconds, then we get

*a* = 56,640 km

or 35,200 miles. The value of *a* is rough since the value of *T* is rough.

Assuming a Keplerian orbit, the moon is at one focus of the orbit, located a distance *c* from the center of the ellipse. If Artemis is 130 km from the surface of the moon at perilune, and the radius of the moon is 1737 km, then

*c* = *a* – (130 + 1737) km = 54,770 km

or 34,000 miles. The semi-minor axis *b* satisfies

*b*² = *a*² – *c*²

and so

*b* = 14,422 km

or 8962 miles.

The eccentricity is *c*/*a* = 0.967. As I’ve written about before, eccentricity is hard to interpret intuitively. Aspect ratio is much easier to imaging than eccentricity, and the relation between the two is highly nonlinear.

Assuming everything above, here’s what the orbit would look like. The distances on the axes are in kilometers.

The orbit is highly eccentric: the center of the orbit is far from the foci of the orbit. But the aspect ratio is about 1/4. The orbit is only about 4 times wider in one direction than the other. It’s obviously an ellipse, but it’s not an extremely thin ellipse.

In an earlier post I showed how to compute the Lagrange points for the Sun-Earth system. We can use the same equations for the Earth-Moon system.

The equations for the distance *r* from the Lagrange points L1 and L2 to the moon are

The equation for L1 corresponds to taking ± as – and the equation for L2 corresponds to taking ± as +. Here *M*_{1} and *M*_{2} are the masses of the Earth and Moon respectively, and *R* is the distance between the two bodies.

If we modify the code from the earlier post on Lagrange points we get

L1 = 54784 km

L2 = 60917 km

where L1 is on the near side of the moon and L2 on the far side. We estimated the semi-major axis *a* to be 56,640 km. This is about 3% larger than the distance from the moon to L1. So the orbit of Artemis passes near or through L1. This assumes the axis of the Artemis orbit is aligned with a line from the moon to Earth, which I believe is at least approximately correct.

- The midpoints of each side.
- The foot of the altitude to each side.
- The midpoint between each vertex and the orthocenter.

The orthocenter is the place where the three altitudes intersect.

In the image above, the midpoints are red circles, the altitudes are blue lines, the feet are blue stars, and the midpoints between the vertices and the orthocenter are green squares.

The post The nine-point circle theorem first appeared on John D. Cook.]]>

A Möbius transformation is a function *f* : ℂ → ℂ of the form

*f*(*z*) = (*az* + *b*)/(*cz* + *d*)

where *ad* – *bc* ≠ 0. One of the basic properties of Möbius transformations is that they form a group. Except that’s not quite right if you want to be completely rigorous.

The problem is that a Möbius transformation isn’t a map from (all of) ℂ to ℂ unless *c* = 0 (which implies *d* cannot be 0). The usual way to fix this is to add a point at infinity, which makes things much simpler. Now we can say that the Möbius transformations form a group of automorphisms on the Riemann sphere *S*².

But if you insist on working in the finite complex plane, i.e. the complex plane ℂ with no point at infinity added, each Möbius transformations is actually a *partial function* on ℂ because a point may be missing from the domain. As detailed in [1], you technically do not have a group but rather an inverse monoid. (See the previous post on using inverse semigroups to think about floating point partial functions.)

You can make Möbius transformations into a group by *defining* the product of the Möbius transformation *f* above with

*g*(*z*) = (*Az* + *B*) / (*Cz* + *D*)

to be

(*aAz* + *bCz* + *aB* + *bD*) / (*Acz* + *Cdz* + *Bc* + *dD*),

which is what you’d get if you computed the composition *f* ∘ *g* as functions, ignoring any difficulties with domains.

The Möbius inverse monoid is surprisingly complex. Things are simpler if you compactify the complex plane by adding a point at infinity, or if you gloss over the fine points of function domains.

- Transformations of Olympic rings
- Curiously simple approximations
- Solving for Möbius transformation coefficients

[1] Mark V. Lawson. The Möbius Inverse Monoid. Journal of Algebra. 200, 428–438 (1998).

The post The Möbius Inverse Monoid first appeared on John D. Cook.]]>Is the Python function

def f(x): return x + 2

invertible? Not always.

You might reasonably think the function

def g(x): return x - 2

is the inverse of `f`

, and it is for many values of x. But try this:

>>> x = 2**53 - 1.0 >>> g(f(x)) - x -1.0

The composition of `f`

and `g`

does not give us `x`

back because of the limited length of a floating point significand. See Anatomy of a floating point number.

The function `f`

as a function between floating point numbers is **locally invertible**. That is, it is invertible on a subset of its domain.

Now let’s look at the function

def f(x): return x*x

Is this function invertible? There is a function, namely `sqrt`

that serves as an inverse to `f`

for many values of `x`

, but not all `x`

. The function `sqrt`

is a **partial function** because although it is ostensibly a function on floating point numbers, it crashes for negative inputs. The function’s actual domain is smaller than its nominal domain.

Locally invertible functions are an inevitable part of programming, and are awkward to reason about. But there are tools that help. For example, **inverse semigroups**.

According to nLab

An inverse semigroup is a semigroup

Ssuch that for every elements∈S, there exists a unique “inverse”s* ∈Ssuch thats s*s=sands*s s*=s*.

The canonical example of an inverse semigroup, and in some sense the *only* example, is the following, also from nLab.

For any set *X*, let *I*(*X*) be the set of all partial bijections on *X*, i.e. bijections between subsets of *X*. The composite of partial bijections is their composite as relations (or as partial functions).

This is the only example in the sense that the Wagner-Preston theorem says every inverse semigroup is isomorphic to a group of this form.

In our case, the set *X* is the set of representable floating point numbers, and locally invertible functions are functions which *are* invertible, but only when restricted to a subset of *X*.

The upper half plane is a sort of secondary hub. You may want to map two regions to and from each other via a half plane. And as with the disk, there’s an explicit solution to Laplace’s equation on a half plane.

Another reason to be interested in Laplace’s equation on a half plane is the connection to the Hilbert transform and harmonic conjugates.

Given a continuous real-valued function *u* on the real line, *u* can be extended to a harmonic function on the upper half plane by taking the convolution of *u* with the Poisson kernel, a variation on the Poisson kernel from the previous post. That is, for *y* > 0,

This gives a solution to Laplace’s equation on the upper half plane with boundary values given by *u* on the real line. The function *u* is smooth on the upper half plane, and its limiting values as *y* → 0 is continuous.

Furthermore, *u* is the real part of an analytic function *f* = *u *+ *iv*. The function *v* is the **harmonic conjugate** of *u*, and also equals the Hilbert transform of *u*.

on a disk?

Laplace’s equation is important in its own right—for example, it’s important in electrostatics—and understanding Laplace’s equation is a stepping stone to understanding many other PDEs.

Why care specifically about a disk? An obvious reason is that you might need to solve Laplace’s equation on a disk! But there are two less obvious reasons.

First, a disk can be mapped conformally to any simply connected proper open subset of the complex plane. And because conformal equivalence is transitive, two regions conformally equivalent to the disk are conformally equivalent to each other. For example, as I wrote about here, you can map a Mickey Mouse silhouette

to and from the Batman logo

using conformal maps. In practice, you’d probably map Mickey Mouse to a disk, and compose that map with a map from the disk to Batman. The disk is a standard region, and so there are catalogs of conformal maps between the disk and other regions. And there are algorithms for computing maps between a standard region, such as the disk or half plane, and more general regions. You might be able to lookup a mapping from the disk to Mickey, but probably not to Batman.

In short, the disk is sort of the **hub** in a hub-and-spoke network of cataloged maps and algorithms.

Secondly, Laplace’s equation has an **analytical solution** on the disk. You can just write down the solution, and we will shortly. If it were easy to write down the solution on a triangle, that might be the hub, but instead its a disk.

Suppose *u* is a real-valued continuous function on the the boundary of the unit disk. Then *u* can be extended to a harmonic function, i.e. a solution to Laplace’s equation on the interior of the disk, via the Poisson integral formula:

Or in terms of polar coordinates:

I wrote a lot of posts on ellipses and related topics over the last couple months. Here’s a recap of the posts, organized into categories.

- Eccentricity, flattening, and aspect ratio
- Latus rectum
- Directrix
- Example of a highly elliptical orbit

- Pascal’s theorem
- Intersection of two conics
- Determining conic sections by points or tangents
- Evolute of an ellipse

Design of experiments is a branch of statistics, and design theory is a branch of combinatorics, and yet they overlap quite a bit.

It’s hard to say precisely what design theory is, but it’s consider with whether objects can be arranged in certain ways, and if so how many ways this can be done. Design theory is pure mathematics, but it is of interest to people working in ares of applied mathematics such as coding theory and statistics.

Here’s a recap of posts I’ve written recently related to design of experiments and design theory.

A few weeks ago I wrote about fractional factorial design. Then later I wrote about response surface models. Then a diagram from central composite design, a popular design in response surface methodology, was one the diagrams in a post I wrote about visually similar diagrams from separate areas of application.

I wrote two posts about pitfalls with A/B testing. One shows how play-the-winner sequential testing has the same problems as Condorcet’s voter paradox, with the order of the tests potentially determining the final winner. More seriously, A/B testing cannot detect interaction effects which may be critical.

There are several civilian and military standards related to design of experiments. The first of these was MIL-STD-105. The US military has retired this standard in favor of the civilian standard ASQ/ANSI Z1.4 which is virtually identical.

Similarly, the US military standard MIL-STD-414 was replaced by the very similar civilian standard ASQ/ANSI Z1.9. This post looks at the mean-range method for estimating variation which these two standards reference.

I wrote a couple posts on Room squares, one on Room squares in general and one on Thomas Room’s original design now known as a Room square. Room squares are used in tournament designs.

I wrote a couple posts about Costas arrays, an introduction and a post on creating Costas arrays in Mathematica.

Latin squares and Greco-Latin squares a part of design theory and a part of design of experiments. Here are several posts on Latin and Greco-Latin squares.

The post Design of experiments and design theory first appeared on John D. Cook.]]>A repunit prime is, unsurprisingly, a repunit number which is prime. The most obvious example is *R*_{2} = 11. Until recently the repunit numbers confirmed to be prime were *R*_{n} for n = 2, 19, 23, 317, 1031. Now the case for *n* = 49081 has been confirmed.

Here is the announcement. The date posted at the top of the page is from March this year, but I believe the announcement is new. Maybe the author edited an old page and didn’t update the shown date.

Incidentally, I noticed a lot of repunits when I wrote about bad passwords a few days ago. That post explored a list of commonly used but broken passwords. This is the list of passwords that password cracking software will try first. The numbers *R*_{n} are part of the list for the following values of *n*:

1–45, 47–49, 51, 53–54, 57–60, 62, 67, 70, 72, 77, 82, 84, 147

So 46 is the smallest value of *n* such that *R _{n}* is not on the list. I would not recommend using

The bad password file is sorted in terms of popularity, and you might expect repunits to appear in the file in order, i.e. shorter sequences first. That is sorta true overall. But you can see streaks in the plot below showing multiple runs where longer passwords are more common than shorter passwords.

The post Repunits: primes and passwords first appeared on John D. Cook.]]>How to solve trig equations in general, and specifically how to solve equations involving quadratic polynomials in sine and cosine.

This weekend I wrote about a change of variables to “depress” a cubic equation, eliminating the quadratic term. This is a key step in solving a cubic equation. The idea can be extended to higher degree polynomials, and applied to differential equations.

Before that I wrote about how to tell whether a cubic or quartic equation has a double root. That post is also an introduction to resultants.

First of all, there was a post on solving Kepler’s equation with Newton’s method, and especially with John Machin’s clever starting point.

Another post, also solving Kepler’s equation, showing how Newton’s method can be good, bad, or ugly.

And out there by itself, Weierstrass’ method for simultaneously searching for all roots of a polynomial.

The post Recent posts on solving equations first appeared on John D. Cook.]]>There are two kinds of experts, consulting experts and testifying experts. These names mean what they say: consulting experts consult with their clients, and testifying experts testify. Usually a lawyer will retain an expert with the intention of having this person testify, but the expert starts out as a de facto consulting expert.

Working with lawyers is quite pleasant. The division of labor is crystal clear: you are hired to be an expert on some topic, they are the experts in matters of law, and the streams don’t cross. You’re treated with deference and respect. Even if a lawyer knows something about your field of expertise, it’s not their role to opine on it.

I’ve never had a lawyer try to twist my arm. It’s not in their interests to do so. I’ve told lawyers things they were disappointed to hear, but I’ve never had a lawyer argue with me.

I’ve turned down engagements when it was immediately apparent that the client didn’t have a statistical case. (They may have a *legal* case, but that’s not my bailiwick.) Sometimes lawyers are grasping at straws, and they may try a statistical argument as a last resort.

One person approached me to do a statistical analysis of **one** data point. Not to be outdone, someone once asked me to do a statistical analysis based on absolutely **no** data. I told both that I’d need a little more data to go on.

John Tukey said that the best part of being a statistician is that you get to play in everyone else’s back yard. I’d expand that to applied math more generally. You can’t be expected to be an expert in everything, but you are expected to come up to speed quickly on the basics of problem domain.

Work on legal cases is confidential, but so is almost everything else I do. However, an intellectual property case I worked on took this to a higher level. I was only allowed to work at opposing counsel’s office, on their laptop, without an internet connection, and without a phone. That was an interesting exercise.

There’s a lot of hurry-up and wait with legal work. A project can be dormant and presumably dead, then suddenly pop back up. This isn’t unique to legal work, but it seems more frequent or more extreme with legal work.

Law firms do everything by the hour. I mostly work by the project, but I’ll work by the hour for lawyers. There are occasional exceptions, but hourly billing is firmly ingrained in legal culture. And reasonably so: it’s hard to say in advance how much work something will take. Sometimes when you *can* reasonably anticipate the scope of a task you can do it fixed bid.

Law firms typically pass through all expenses. So even if a firm hires you, their client is responsible for paying you. You don’t get paid until the law firm gets paid, which can sometimes take a while.

A few years ago I had to fly around a fair amount. That was fun for a while but it got old. I haven’t had to travel for work since the pandemic and I’m OK with that.

The post Expert witness experiences first appeared on John D. Cook.]]>A linear differential equation can be viewed as a polynomial in the differential operator *D* applied to the function we’re solving for. More on this idea here. So it makes sense that a technique analogous to the technique used for “depressing” a polynomial could work similarly for differential equations.

In the differential equation post mentioned above, we started with the equation

and reduced it to

using the change of variable

So where did this change of variables come from? How might we generalize it to higher-order differential equations?

In the post on depressing a polynomial, we started with a polynomial

and use the change of variables

to eliminate the *x*^{n-1} term. Let’s do something analogous for differential equations.

Let *P* be an *n*th degree polynomial and consider the differential equation

We can turn this into a differential

where the polynomial

has no term involving *D*^{n-1} by solving

which leads to

generalizing the result above for second order ODEs.

The post Eliminating terms from higher-order differential equations first appeared on John D. Cook.]]>We will use big-O notation *O*(*x*^{k}) to mean terms involving *x* to powers no higher than *k*. This is slightly unusual, because typically big-O notation is used when some variable is tending to a limit, and we’re not taking limits here.

Let’s start with an *n*th degree polynomial

Here *a* is not zero, or else we wouldn’t have an *n*th degree polynomial.

The following calculation shows that the change of variables

results in an *n*th degree polynomial in *t* with no term involving *x*^{n – 1}.

This approach works over real or complex numbers. It even works over finite fields too, if you can divide by *na*.

I’ve mentioned a couple times that the Weierstrass form of an elliptic curve

is the most general except when working over a field of characteristic 2 or 3. The technique above breaks down because 3*a* may not be invertible in a field of characteristic 2 or 3.

The previous post showed how to reduce a general cubic equation to one in the form

which is called a “depressed cubic.” In a nutshell, you divide by the leading coefficient then do a simple change of variables that removes the quadratic term.

Now what? This post will give a motivated but completely ahistorical approach for removing the linear term *cx*.

Suppose we don’t know how to solve cubic equations. What do we know how to solve? Quadratic equations. So a natural question to ask is how we might find a quadratic equation that has the same roots as our cubic equation. Well, how can you tell in general whether two polynomials have a common root? Resultants.

This is the point where we completely violate historical order. Tartaglia discovered a general solution to depressed cubic equations in the 16th century [1], but Sylvester introduced the resultant in the 19th century. Resultants were a great idea, but not a rabbit out of a hat. It’s not far fetched that some sort of determinant could tell you whether two polynomials have a common factor since this is analogous to two sets of vectors having overlapping spans. I found the idea of using resultants in this context in [2].

In 1683, Tschirnhaus published the transform that in modern terminology amounts to finding a polynomial *T*(*x*, *y*) that has zero resultant with a depressed cubic.

Tschirnhaus assumed his polynomial *T* has the form

Let’s take the resultant of our cubic and Tschirnhaus’ quadratic using Mathematica.

Resultant[x^3 + c x + d, x^2 + a x + 2 c/3 + y, x]

This gives us

which is a cubic equation in *y*. If the coefficient of *y* were zero, then we could solve the cubic equation for *y* by simply taking a cube root. But we can make that happen by our choice of *a*, i.e. we pick *a* to solve the quadratic equation

So we solve this equation for *a*, plug either root for *a* into the expression for the resultant, then solve for *y*. Then we take that value of *y* and find where Tschirnhaus’ polynomial is zero by solving the quadratic equation

We solved for a value of *y* that makes the resultant zero, so our original polynomial and Tschirnhaus’ polynomial have a common root. So one of the roots of the equation above is a root of our original cubic equation.

[1] In this blog post, we first reduced the general quadratic to the depressed form, then solved the depressed form. This isn’t the historical order. Tartaglia came up with a general solution to the depressed cubic equation, but was not able to solve equations containing a quadratic term.

[2] Victor Adamchik and David Jeffrey. Polynomial Transformations of Tschirnhaus, Bring and Jerrard. ACM SIGSAM Bulletin, Vol 37, No. 3, September 2003.

The post How to solve a cubic equation first appeared on John D. Cook.]]>A **depressed cubic** is a simplified form of a cubic equation. The odd-sounding terminology suggests that this is a very old idea, older than the current connotation of the word *depressed*. That is indeed the case. According to Etymonline the term *depress* originally meant “put down by force, conquer” and the psychological use of the of the word came later. To this day you’ll occasionally hear of a button being depressed.

A depressed cubic equation is depressed in the sense that the quadratic term has been removed. Such an equation has the form

Once you’ve put the equation in depressed form you’ve conquered the quadratic term.

So how do you put a cubic equation in depressed form? First, divide by the leading coefficient, then use the **change of variables** [1]

For example, suppose we start with the equation

We first turn this into

Then we set *x* = *t* – 19/33. This gives us

which has no quadratic term.

We can use Mathematica to show that this works in general:

Simplify[x^3 + b x^2 + c x + d /. x -> (t - b/3)]

This returns

This post shows that an analogous change of variables works for higher-order polynomials as well.

If you look into elliptic curves, you’ll often see them defined as a set of points satisfying

Why no quadratic term? Because you can always remove it using the process above. Well, not quite always. The depression trick doesn’t work for elliptic curves over finite fields of characteristic 2 or 3. If you’d like to read more about this exception, see this post.

That is the subject of the next post.

[1] You could do the change of variables first, using *x* = *t* – *b*/*a*. This removes the quadratic term, but leaves the leading coefficient *a*.

Well, truth is stranger than fiction. There **are** four new SI prefixes. These were recently approved at the 27th General Conference on Weights and Measures. Here is the resolution (in French).

The new prefixes are:

- 10
^{30}quetta (Q) - 10
^{27}ronna (R) - 10
^{-27}ronto (r) - 10
^{-30}quecto (q)

The names were the suggestion of Richard J. C. Brown. He gives seven desirable properties of new names:

- The names should be simple and, if possible, meaningful and memorable.
- The names should have some connection to the powers of 10
^{3}that they represent. - The names should be based on either Latin or Greek as the most used languages previously.
- Multiples should end ‘-a’ and sub-multiples should end ‘-o’.
- The symbols used should be the same letter for a given power of ten, in upper case for multiples and in lower case for sub-multiples.
- Letters already in use for SI prefixes, SI units, other common units, or symbols that may otherwise cause confusion, should be avoided.
- Following the precedent set recently, letters should be used in reverse English alphabetical order, suitably modifying chosen names, and skipping letters as appropriate.

OK, so how does that lead to the new prefixes? Point #4 explains the last letter of each prefix.

Brown says that the etymology of ronna and ronto is

Greek & Latin, derived from ‘ennea’ and ‘novem’, suggesting 9 (ninth power of 10

^{3})

and that the etymology of quetta and quecto is

Latin, derived from ‘decem’, suggesting 10 (tenth power of 10

^{3}).

That’s quite a stretch.

The largest prefix had been zetta and yotta, so Brown wanted letters that came before Y in the alphabet. P was already used (peta and pico) and the next two unused letters were Q and R. So the prefixes for 10^{30} and 10^{27 }begin with Q and R.

Presumably ronna uses an O because yotta had an O for the second letter. And the next letter N comes from the N’s in ennea and novem.

It seems quetta used the Q sound because Q was the next letter available, and an allusion to the hard C in decem. The “etta” part is reminiscent of zetta.

The post New SI prefixes and their etymology first appeared on John D. Cook.]]>You can use trig identities to reduce the problem to finding sin(*x*) for |*x*| ≤ 1. Let’s take the worst case and assume we want to calculate sin(1).

The series for sine alternates, and so by the alternating series theorem we need the first term left out of our Taylor approximation to be less than our error tolerate ε. If we keep the terms of the Taylor series up to *x*^{2m – 1} then we need

*x*^{2m + 1} / (2*m* + 1)! < ε.

Since we’re interested in *x* = 1 and Γ(*n* + 1) = *n*!, we need

1/ε < Γ(2*m* + 2).

That means we need

2*m* + 2 > Γ^{-1}(1/ε).

But how do you compute the inverse of the gamma function? This is something I wrote about a few years ago.

Here is a function for approximately computing the inverse of the gamma function. See the earlier post for details.

from numpy import pi, e, sqrt, log from scipy.special import lambertw def inverse_gamma(x): c = 0.03653381448490056 L = log((x+c)/sqrt(2*pi)) return L / (lambertw(L/e)) + 0.5

Suppose we want to compute sin(1) to 100 decimal places. We need 2*m* + 2 to be larger than Γ^{-1}(10^{100}), and the code above tells us that Γ^{-1}(10^{100}) is something like 70.9. This tells us we can choose *m* = 35.

If we want to compute sin(1) to thousands of digits, the code above will fail because we cannot represent 10^{1000} as a floating point number. I will assume for this post that we will use an extended precision library for summing the series for sin(1), but we’re going to use ordinary precision to *plan* this calculation, i.e. to decide how many terms to sum.

If we look closely at the function `inverse_gamma`

above we see that it only depends on *x* via log(*x* + *c*). Since we’re interested in large *x*, we can ignore the difference between log(*x* + *c*) and log(*x*). This lets us write a new version of `inverse_gamma`

that takes the log of *x* rather than *x* as an argument.

def inverse_gamma2(logx): L = logx - log(sqrt(2*pi)) return L/lambertw(L/e) + 0.5

Calling `inverse_gamma`

with `x`

= 10^{100} gives the same result, down to the last decimal place, as calling `inverse_gamma2`

with 100 log(10).

We asked at the top of the post about computing sine to a million decimal places. If we call `inverse_gamma2(1e6*log(10))`

we get 205023.17… and so *m* = 102,511 would be large enough.

***

If you enjoyed reading this post, you may like reading this post on planning a world record calculation for computing ζ(3).

The post Calculating sine to an absurd number of digits first appeared on John D. Cook.]]>When a program needs to work with different systems of units, it’s best to consistently use one system for all internal calculations and convert to another system for output if necessary. Rigidly following this convention can prevent bugs, such as the one that caused the crash of the Mars Climate Orbiter.

For example, maybe you need to work in degrees and radians. It would be sensible to do all calculations in radians, because that’s what software libraries expect, and output results in degrees, because that’s what humans expect.

Now suppose you have a function that takes in a length and doubles it, and another function takes in a length and triples it. Both functions take in length in kilometers but print the result in miles.

You would like the composition of the two functions to multiply a length by six. And as before, the composition would take in a speed in kilometers and return a speed in miles.

Here’s how we could implement this badly.

miles_per_km = 5/8 # approx def double(length_km): return 2*length_km*miles_per_km def triple(length_km): return 3*length_km*miles_per_km length_km = 8 d = double(length_km) print("Double: ", d) t = triple(d) print("Triple: ", t)

This prints

Double: 10.0 Triple: 18.75

The second output should be 30, not 18.5. The result is wrong because we converted from kilometers to miles twice. The correct implementation would be something like the following.

miles_per_km = 0.6213712 def double(length_km): d = 2*length_km print("Double: ", d*miles_per_km) return d def triple(length_km): t = 3*length_km print("Triple: ", t*miles_per_km) return t length_km = 8 d = double(length_km) t = triple(d)

This prints the right result.

Double: 10.0 Triple: 30.0

In abstract terms, we don’t want the composition of *f* and *g* to be simply *g* ∘ *f*.

We have a function *f* from *X* to *Y* that we think of as our core function, and a function *T* that translates the output. Say *f* doubles its input and *T* translates from kilometers to miles. Let *f** be the function that takes *X* to *TY*, i.e. the combination of *f* and translation.

Now take another function *g* from *Y* to *Z* and define *g** as the function that takes *Y* to *TZ*. We want the composition of *f** and *g** to be

*g** ∘ *f** = *T ∘ g ∘ f*.

In the example above, we only want to convert from kilometers to miles once. This is exactly what Kleisli composition does. (“Kleisli” rhymes with “highly.”)

Kleisli composition is conceptually simple. Once you understand what it is, you can probably think of times when it’s what you wanted but you didn’t have a name for it.

Writing code to encapsulate Kleisli composition takes some infrastructure (i.e. monads), and that’s a little complicated, but the idea of what you’re trying to achieve is not. Notice in the example above, what the functions print is not what they return; the print statements are a sort of side channel. That’s the mark of a monad.

The things we’ve been talking about are formalized in terms of Kleisli categories. You start with a category *C* and define another category that has the same objects as *C* does but has a different notion of composition, i.e. Kleisli composition.

Given a monad *T* on *C*, the Kleisli category *C*_{T} has the same objects as *C*. An arrow *f** from *X* to *Y* in *C*_{T} corresponds to an arrow *f* from *X* to *TY* in *C*. In symbols,

Hom_{CT}(*X*, *Y*) = Hom_{C}(*X*, *TY*).

Mr. Kleisli’s motivation for defining his categories was to answer a more theoretical question—whether all monads arise from adjunctions—but more practically we can think of Kleisli categories as a way of formalizing a variation on function composition.

My interest in category theory goes in cycles. Something will spark my interest in it, and I’ll dig a little further. Then I reach my abstraction tolerance and put it back on the shelf. Then sometime later something else comes up and the cycle repeats. Each time I get a little futher.

A conversation with a client this morning brought me back to the top of the cycle: category theory may be helpful in solving a concrete problem they’re working on.

I’m skeptical of applied category theory that starts with categories. I’m more bullish on applications that start from the problem domain, a discussion something like this.

“Here’s a pattern that we’re trying to codify and exploit.”

“Category theory has a name for that, and it suggests you might also have this other pattern or constraint.”

“Hmm. That sounds plausible. Let me check.”

I think of category theory as a pattern description language, a way to turn vague analogies into precise statements. Starting from category theory and looking for applications is less likely to succeed.

When I left academia the first time, I got a job as a programmer. My first assignment was to make some change to an old Fortran program, and I started asking a lot of questions about context. My manager cut me off saying “You’ll never get here from there.” I had to work bottom-up, starting from the immediate problem. That lesson has stuck with me ever since.

Sometimes you *do* need to start from the top and work your way down, going from abstract to concrete, but less often that I imagined early in my career.

If your password is in the file rockyou.txt then it’s a bad password. Password cracking software will find it instantly. (Use long, randomly generated passwords; staying off the list of worst passwords is necessary but not sufficient for security.)

The `rockyou.txt`

file currently contains 14,344,394 bad passwords. I poked around in the file and this post reports some things I found.

To make things more interesting, I made myself a rule that I could only use command line utilities.

I was curious how many of these passwords consisted only of digits so I ran the following.

grep -P '^\d+$' rockyou.txt | wc -l

This says 2,346,744 of the passwords only contain digits, about 1 in 6.

I made a file of digits appearing in the passwords

grep -o -P '\d' rockyou.txt > digits

and looked at the frequency of digits.

for i in 0 1 2 3 4 5 6 7 8 9 do grep -c $i digits done

This is what I got:

5740291 6734380 5237479 3767584 3391342 3355180 3118364 3100596 3567258 3855490

The digits are distributed more evenly than I would have expected. 1’s are more common than other digits, but only about twice as common as the least common digits.

How long is the longest bad password? The command

wc -L rockyou.txt

shows that one line in the file is 285 characters long. What is this password? The command

grep -P '.{285}' rockyou.txt

shows that it’s some HTML code. Nice try whoever thought of that, but you’ve been pwned.

A similar search for all-digit passwords show that the longest numeric passwords are 255 digits long. One of these is a string of 255 zeros.

A common bit of advice is to not choose passwords that can be found in a database. That’s good advice as far as it goes, but it doesn’t go very far.

I used the comm utility to see how many bad passwords are not in the dictionary by running

comm -23 sorted dict | wc -l

and the answer was 14,310,684. Nearly all the bad passwords are not in a dictionary!

(Here `sorted`

is a sorted version of the `rockyou.txt`

file; I believe the file is initially sorted by popularity, worst passwords first. The `comm`

utility complained that my system dictionary isn’t sorted, which I found odd, but I sorted it to make `comm`

happy and `dict`

is the sorted file.)

Curiously, the command

comm -13 sorted dict | wc -l

shows there are 70,624 words in the dictionary (specifically, the `american-english`

file on my Linux box) that are *not* on the bad password list.

What is the smallest number not in the list of pure numeric passwords? The following command strips leading zeros from purely numeric passwords, sorts the results as numbers, removes duplicates, and stores the results in a file called `nums`

.

grep -P '^\d+$' rockyou.txt | sed 's/^0\+//' | sort -n | uniq > nums

The file `nums`

begins with a blank. I removed this with `sed`

.

sed -i 1d nums

Next I used `awk`

to print instances where the line number does not match the line in the file `nums`

.

awk '{if (NR-$0 < 0) print $0 }' nums | less

The first number this prints is 61. This means that the first line is 1, the second line is 2, and so on, but the 60th line is 61. That means 60 is missing. The file `rockyou.txt`

does not contain 60. You can verify this: the command

grep '^60$' rockyou.txt

returns nothing. 60 is the smallest number not in the bad password file. There are passwords that contain ’60’ as a substring, but just 60 as a complete password is not in the file.

has a double root if and only if the discriminant

is zero.

The discriminant of a cubic is much less known, and the analogs for higher order polynomials are unheard of. There is a discriminant for polynomials of all degrees, though the complexity of the discriminant grows quickly with the degree of the polynomial.

This post will derive the discriminant of a **cubic** and a **quartic**.

The resultant of a pair of polynomials is zero if and only if the two polynomials have a common root. Resultants have come up in a couple previous posts about solving trigonometric equations.

A polynomial *p*(*x*) has a double root if *p* and its derivative *p*‘ are both zero somewhere. The discriminant of *p* is the resultant of *p* and *p’.*

The resultant of two polynomials is a determinant of their Sylvester matrix. This matrix is easier to describe by example than by equation. You basically fill a matrix with shifts of the coefficients of both polynomials and fill in the gaps with zeros.

MathWorld gives the following Mathematica code for the Sylvester matrix of two inputs.

SylvesterMatrix1[poly1_, poly2_, var_] := Function[{coeffs1, coeffs2}, With[ {l1 = Length[coeffs1], l2 = Length[coeffs2]}, Join[ NestList[RotateRight, PadRight[coeffs1, l1 + l2 - 2], l2 - 2], NestList[RotateRight, PadRight[coeffs2, l1 + l2 - 2], l1 - 2] ] ] ][ Reverse[CoefficientList[poly1, var]], Reverse[CoefficientList[poly2, var]] ]

If we apply this to the cubic polynomial

we get the following matrix.

We can compute the resultant by taking the determinant of the above matrix.

g[x_] := a x^3 + b x^2 + c x + d SylvesterMatrix1[g[x], D[g[x], x], x]

We get the following result

and we can verify that this is the same result we would get from calling the `Resultant`

directly with

Resultant[g[x], D[g[x], x], x]

Although the resultant is defined in terms of a determinant, that doesn’t mean that resultants are necessarily computed by computing determinants. The Sylvester matrix is a very special matrix, and there are clever ways to exploit its structure to create more efficient algorithms.

Each term in the resultant has a factor of *a*, and the discriminant is the resultant divided by –*a*.

Now let’s repeat our exercise for the quartic. The Sylvester matrix for the quartic polynomial

and its derivative is

I created the image above with the following Mathematica code.

f[x_] := a x^4 + b x^3 + c x^2 + d x + e TeXForm[SylvesterMatrix1[f[x], D[f[x], x], x]]

If we take the determinant, we get the resultant, but it’s a mess.

Again each term has a factor of *a*, so we can divide by *a*. to get the discriminant.

If we want to use this in code, we can have Mathematica export the expression in C code using `CForm`

. To generate Python code, it’s more convenient to use `FortranForm`

since Python like Fortran uses `**`

for exponents.

The following Python code was created by pasting the output of

FortranForm[Resultant[f[x], D[f[x], x], x]]

and making it into a function.

def quartic_resultant(a, b, c, d, e): return a*b**2*c**2*d**2 - 4*a**2*c**3*d**2 - 4*a*b**3*d**3 + 18*a**2*b*c*d**3 \ - 27*a**3*d**4 - 4*a*b**2*c**3*e + 16*a**2*c**4*e + 18*a*b**3*c*d*e \ - 80*a**2*b*c**2*d*e - 6*a**2*b**2*d**2*e + 144*a**3*c*d**2*e \ - 27*a*b**4*e**2 + 144*a**2*b**2*c*e**2 - 128*a**3*c**2*e**2 \ - 192*a**3*b*d*e**2 + 256*a**4*e**3

Let’s try this on a couple examples. First

which has a double root at 0.

As expected

quartic_resultant(1, -5, 6, 0, 0)

returns 0.

Next let’s try

The call

quartic_resultant(1, -10, 35, -50, 24)

returns 144. We expected a non-zero result since our polynomial has distinct roots at 1, 2, 3, and 4.

In general the discriminant of an *n*th degree polynomial is the resultant of the polynomial and its derivative, up to a constant. There’s no need to worry about this constant if you’re only concern is whether the discriminant is zero. To get the exact discriminant, you divide the resultant by the leading coefficient of the polynomial and adjust the sign.

The sign convention is a little strange. If you look back at the examples above, we divided by –*a* in the cubic case but we divided by *a* in the quartic case. You might reasonably guess that you should divide by *a* and multiply by (-1)^{n}. But that won’t give you right sign for the quadratic case. The conventional sign is

(-1)^{n(n – 1)/2}.

So when *n* equals 2 or 3 we get a negative sign, but when *n* equals 4 we don’t.

The Student *t* distribution with ν degrees of freedom has two important special cases: ν = 1 and ν = ∞. When ν = 1 we get the Cauchy distribution, and in the limit as ν → ∞ we get the normal distribution. The expression for entropy is simple in these two special cases, but it’s not at all obvious that the general expression at ν = 1 and ν = ∞ gives the entropy for the Cauchy and normal distributions.

The entropy of a Cauchy random variable (with scale 1) is

and the entropy of a normal random variable (with scale 1) is

The entropy of a Student *t* random variable with ν degrees of freedom is

Here ψ is the digamma function, the derivative of the log of the gamma function, and *B* is the beta function. These two functions are implemented as `psi`

and `beta`

in Python, and `PolyGamma`

and `Beta`

in Mathematica. Equation for entropy found on Wikipedia.

This post will show numerically and analytically that the general expression does have the right special cases. As a bonus, we’ll prove an asymptotic formula for the entropy along the way.

Numerical evaluation shows that the entropy expression with ν = 1 does give the entropy for a Cauchy random variable.

from numpy import pi, log, sqrt from scipy.special import psi, beta def t_entropy(nu): S = 0.5*(nu + 1)*(psi(0.5*(nu+1)) - psi(0.5*nu)) S += log(sqrt(nu)*beta(0.5*nu, 0.5)) return S cauchy_entropy = log(4*pi) print(t_entropy(1) - cauchy_entropy)

This prints 0.

Experiments with large values of ν show that the entropy for large ν is approaching the entropy for a normal distribution. In fact, it seems the difference between the entropy for a *t* distribution with ν degrees of freedom and the entropy of a standard normal distribution is asymptotic to 1/ν.

normal_entropy = 0.5*(log(2*pi) + 1) for i in range(5): print(t_entropy(10**i)- normal_entropy)

This prints

1.112085713764618 0.10232395977100861 0.010024832113557203 0.0010002498337291499 0.00010000250146458001

There are tidy expressions for the ψ function at a few special arguments, including 1 and 1/2. And the beta function has a special value at (1/2, 1/2).

We have ψ(1) = -γ and ψ(1/2) = -2 log 2 – γ where γ is the Euler–Mascheroni constant. So the first half of the expression for the entropy of a *t* distribution with 1 degree of freedom reduces to 2 log 2. Also, *B*(1/2, 1/2) = π. Adding these together we get 2 log 2 + log π which is the same as log 4π.

For large *z*, we have the asymptotic series

See, for example, A&S 6.3.18. We’ll also need the well-known fact that log(1 + *z*) ∼ *z*. for small *z*,

Next we use the definition of the beta function as a ratio of gamma functions, the fact that Γ(1/2) = √π, and the asymptotic formula here to find that

This shows that the entropy of a Student *t* random variable with ν degrees of freedom is asymptotically

for large ν. This shows that we do indeed get the entropy of a normal random variable in the limit, and that the difference between the Student *t* and normal entropies is asymptotically 1/ν, proving the conjecture inspired by the numerical experiment above.

As outlined earlier, we turn the equation into a system of equations in *s* and *c*.

The resultant of

and

as a function of *s*is

where

Let’s look at a particular example. Suppose we want to solve

Then the possible sine values are the roots of

This equation as four real roots: *s* = -0.993462, -0.300859, -0.0996236, or 0.966329.

So any solution θ to our original equation must have sine equal to one of these values. Now sine takes on each value twice during each period, so we have a little work left to find the values of θ. Take the last root for example. If we take the arcsine of 0.966329 we get 1.31056, and θ = 1.31056 is *not* a solution to our equation. But arcsin(*y*) returns only one possible solution to the equation sin(*x*) = *y*. In this case, θ = π – 1.31056 is the solution we’re looking for.

The full set of solutions for 0 ≤ θ < 2π are

In the example above our polynomial in *s* had four real roots in [-1, 1]. In general we could have roots outside this interval, including complex roots. If we’re looking for solutions with real values of θ then we discard these.

Now suppose we want to solve

Our resultant is

and the roots are 0.119029, 0.987302, and -0.766973 ± 0.319513*i*.

If we’re only interested in real values of θ then the two solutions are arcsin(0.119029) = 0.119312 and arcsin(0.987302) = 1.41127. But there are two complex solutions, θ = 3.91711 ± 0.433731*i*.

The idea of the Weierstrass-Durand-Kerner method is to imagine you already had all but one of the roots and write down the expression you’d use to find the remaining root. Take a guess at all the roots, then solve for each root as if the remaining roots were correct. Iterate this process, and hopefully the process will converge on the roots. I say “hopefully” because the method does not always converge, though it often works very well in practice [1].

Here’s a Python implementation of the method for the special case of 4th degree polynomials. The general case is analogous.

def maxabs(a, b, c, d): return max(abs(a), abs(b), abs(c), abs(d)) # Find roots of a 4th degree polynomial f # starting with initial guess powers of g. def findroots(f, g): p, q, r, s = 1, g, g**2, g**3 dp = dq = dr = ds = 1 tol = 1e-10 while maxabs(dp, dq, dr, ds) > tol: dp = f(p)/((p-q)*(p-r)*(p-s)); p -= dp dq = f(q)/((q-p)*(q-r)*(q-s)); q -= dq dr = f(r)/((r-p)*(r-q)*(r-s)); r -= dr ds = f(s)/((s-p)*(s-q)*(s-r)); s -= ds return p, q, r, s

Lets apply this to the polynomial

(*x*² + 1)(*x* + 2)(*x* – 3)

whose roots are *i*, –*i*, -2, and 3.

f = lambda x: (x**2 + 1)*(x + 2)*(x-3) findroots(f, 1 + 1.2j)

Here is a plot of the iterates as they converge to the roots.

Each color corresponds to a different root. Each starts at the initial guess marked with × and ends at the root marked with a circle.

[1] Bernhard Reinke, Dierk Schleicher and Michael Stoll. The Weierstrass–Durand–Kerner root finder is not generally convergent. Mathematics of Computation. Volume 92, Number 339. Available online.

The post Simultaneous root-finding first appeared on John D. Cook.]]>So suppose we plot a straight path from Quito to Jerusalem on a Mercator projection.

The red dot in the lower left corner represents Quito and the next red dot represents Jerusalem.

Mercator projection leaves longitude λ unchanged, but latitude φ is transformed via

φ ↦ log( sec φ + tan φ )

for reasons explained here. We can apply the inverse of the Mercator projection to put the path above on a globe, and when we do, it looks like the following.

The path planned on a Mercator projection map when projected onto the globe becomes a logarithmic spiral in polar projection. The radial direction in the plot above shows the angle down from the North Pole rather than the angle up from the equator.

So if our flight of constant bearing keeps going rather than stopping at Jerusalem, it will spiral quickly toward the North Pole. It appears to stop at pole unless you look carefully. In theory the spiral keeps going and never actually reaches the pole. This is easy to see on the Mercator map because the North Pole is infinitely far away on the vertical axis.

The post Mercator and polar projections first appeared on John D. Cook.]]>A straight line on a globe is an arc of a great circle, the shortest path between two points. When projected onto a map, a straight path looks curved. Here’s an image I made for a post back in August.

The red lines form a spherical triangle with vertices at Quito, Nairobi, and Jerusalem. The leg from Quito to Nairobi is straight because it follows the equator. And the leg from Nairobi to Jerusalem is straight because it follows a meridian. But the leg from Quito to Jerusalem looks wrong.

If you were flying from Quito to Jerusalem and saw this flight plan, you might ask “Why aren’t we flying straight there, cutting across Africa rather than making a big arc around it? Are we trying to avoid flying over the Sahara?”

But the path from Quito to Jerusalem *is* straight, on a globe. It’s just not straight on the map. The map is not the territory.

Now let’s look at things from the opposite direction. What do straight lines on a map look like on a globe? By map I mean a Mercator projection. You could take a map and draw a straight line from Quito to Jerusalem, and it would cross every meridian at the same angle. A pilot could fly from Quito to Jerusalem along such a path without ever changing bearing. But the plane would have to turn continuously to stay on such a bearing, because this is not a straight line.

A straight line on a Mercator projection is a spiral on a globe, known as a **loxodrome** or a **rhumb line**. If a plane flew on a constant bearing from Quito but few over Jerusalem and kept going, it would spiral toward the North Pole. It would keep circling the earth, crossing the meridian through Jerusalem over and over, each time at a higher latitude. On a polar projection map, the plane’s course would be approximately a logarithmic spiral. The next post goes into this in more detail.

I made the image above using the Mathematica code found here.

Although straight lines the globe are surprising on a map, straight lines on a map are even more surprising on a globe.