This formula gives the signed area: the area is positive if the points are given in countclockwise order and negative otherwise.

I’ll illustrate the formula with a little Python code. Let’s generate a random triangle.

import numpy as np np.random.seed(20221204) r = 100*np.random.random(6) z1 = r[0] + 1j*r[1] z2 = r[2] + 1j*r[3] z3 = r[4] + 1j*r[5]

Here’s what our triangle looks like plotted.

Now let’s calculate the area using the formula above and using Heron’s formula.

def area_det(z1, z2, z3): det = 0 det += z2*z3.conjugate() - z3*z2.conjugate() det -= z1*z3.conjugate() - z3*z1.conjugate() det += z1*z2.conjugate() - z2*z1.conjugate() return 0.25j*det def area_heron(z1, z2, z3): a = abs(z1-z2) b = abs(z2-z3) c = abs(z3-z1) s = 0.5*(a + b + c) return np.sqrt(s*(s-a)*(s-b)*(s-c)) print(area_heron(z1, z2, z3)) print(area_det(z1, z2, z3))

This prints -209.728 and 209.728. The determinate gives a negative area because it was given the points in clockwise order.

[1] Philip J. Davis. Triangle Formulas in the Complex Plane. Mathematics of Computation. January 1964.

The post Area of a triangle in the complex plane first appeared on John D. Cook.]]>An analogous result says that if the vector field F has zero divergence, again over a simply connected domain, then there is a vector potential Φ whose curl is F.

These are both special cases of Poincaré’s lemma.

This post will outline how to calculate Φ. First of all, Φ is far from unique. Any vector field with zero curl can be added to Φ without changing its curl. So if

Φ = (Φ_{1}, Φ_{2}, Φ_{3})

then we can assume that one of the components, say Φ_{3}, is zero by adding the right curl-free component. If you find that argument less than convincing, look at it this way: we’re going to solve a harder problem than simply find Φ such that ∇×Φ = F by giving ourselves the additional requirement that the last component of Φ must be zero.

Now if Φ = (Φ_{1}, Φ_{2}, 0) and ∇×Φ = F, then

-∂_{z} Φ_{2}= F_{1}

∂_{z} Φ_{1} = F_{2}

∂_{x} Φ_{2} – ∂_{y} Φ_{1} = F_{3}

We can solve the first equation by integrating F_{1} with respect to *z* and adding a function of *x* and *y* to be determined later. We can solve the second equation similarly, then use the third equation to determine the functions of *x* and *y* left over from solving the first two equations.

***

This post is the third, and probably last, in a series of posts looking at vector calculus from a more advanced perspective. The first post in the series looked at applying {grad, curl, div} to {grad, curl, div}: seeing which combinations are defined, and which combinations are always 0. To illustrate the general pattern, the post dips into differential forms and the Hodge star operator.

The second post looked at finding the (scalar) potential function for a curl-free vector field, and mentions connections to topology and differential equations.

The present post is a little terse, but it makes more sense if you’ve gone through the previous post. The method of solution here is analogous to the method in the previous post, and that post goes into a little more detail.

The post Finding a vector potential whose curl is given first appeared on John D. Cook.]]>It’s clear that having zero curl is a necessary condition for a vector field to be the gradient of a potential. It’s not as clear that having zero curl is sufficient. And in fact, it’s not quite sufficient: it’s sufficient over a **simply connected domain**. This means a domain with no holes: any loop you draw in the domain can be continuously deformed to a point without leaving the domain. If you’re working with vector fields defined everywhere on ℝ³ then you don’t have to worry about this condition because ℝ³ is simply connected.

For a calculus student, the requirement that a domain be simply connected is a footnote. For a topologist, it’s the main thing.

If a domain is not simply connected, then a vector field might have curl zero, but not be the gradient of a potential. So in general we have two kinds of vector fields with zero curl: those that are gradients and those that are not. We could look at the space of all vector fields that have zero curl and mod out by the vector fields that are gradients. The dimension of this space tells us something about how the domain is connected. This is the start of de Rham cohomology.

If a vector field is conservative, i.e. if it is the gradient of a potential function φ, then you can find φ by integration.

The partial derivative of φ with respect to *x* is the first component of your vector field, so you can integrate to find φ as a function of *x* (and *y* and *z*). This integral will only be unique up to a constant, and functions of *y* and *z* alone are constants as far as partial derivatives with respect to *x* are concerned.

Now the partial derivative of φ with respect to *y* has to equal the second component of your vector field. So taking the derivative of what you found above determines your potential, up to a function of *z*. So then you differentiate again, this time with respect to *z*, and set this equal to the third component of your vector field, you’ve determined your potential function up to an constant. And that’s as far as it can be determined: any constant term goes away when you take the gradient.

A differential equation of the form

*M*(*x*, *y*) + *N*(*x*, *y*) *y*‘(*x*) = 0

is said to be exact if the partial derivative of *M* with respect to *y* equals the partial of *N* with respect to *x.* In that case you can find a function φ such that the partial of φ with respect to *x* is *M* and the partial of φ with respect to *y* is *N* (assuming you’re working in a simply connected domain). This function φ is a potential, though differential equation texts don’t call it that, and you find it just as you found φ above. The solution to your differential equation is given (implicitly) by

φ(*x*, *y*) = *c*

where *c* is a constant to be determined by initial conditions.

For a vector field over a simply connected domain, having zero curl is necessary and sufficient for the existence of a potential function φ. This is a case of Poincaré’s lemma. The next post will look at another case of Poincaré’s lemma, finding a vector potential.

The post Conservative vector fields first appeared on John D. Cook.]]>It’s a mess that’s hard to sort out without pulling out differential forms. This post will show how a calculus student could make some sense out of all this, and how differential forms clarify the situation further.

We’ll start out looking at things from the perspective of a calculus student. We can make a table of all nine possible combinations of {grad, curl, div} applied to a {grad, curl, div} and start by asking which combinations make sense.

Gradient is something that takes a scalar function and returns a vector field. Curl takes a vector field and returns another vector field. Divergence takes a vector field and returns a scalar function. This means that only five of our nine combinations are even defined.

It turns out that the divergence of a curl is zero, and the curl of a gradient is zero (the zero vector). The other three possibilities are defined, but they are not zero in general.

So we can extend our chart to include the zeros.

Plain text version of chart image included at the bottom of the post.

From the perspective of differential forms, a scalar function *f* is a 0-form.

The differential of a 0-form is a 1-form and corresponds to the gradient.

The differential of a 1-form is a 2-form and corresponds to curl.

The differential of a 2-form is a 3-form and corresponds to divergence.

The differential of a differential is 0: *d*² = 0. This holds for *k* forms in general, for any non-negative integer *k*. So the curl of a gradient is 0 and the divergence of a curl is 0.

Gradients are 1-forms and curls are 2-forms. They’re different kinds of things. Vector calculus hides this distinction, which initially makes things simpler but ultimately makes things harder.

Now what about the three possibilities marked with question marks in the table above: the divergence of a gradient, the curl of a curl, and the gradient of a divergence?

From the perspective of differential forms, these are illegal operations. You cannot take the divergence of a gradient, because divergence operates on 2-forms, and a gradient is a 1-form. Similarly, you cannot take the curl of a curl or the gradient of a divergence. You could think of differential forms as adding type-checking to vector calculus.

But operations like taking the divergence of a gradient are legal in vector calculus. What gives?

The Hodge star operator is a duality between *k*-forms and (*n*–*k*)-forms. In vector calculus *n* = 3, and so the Hodge star takes 0 forms to 3 forms and 3-forms to 0-forms. It also takes 1-forms to 2-forms and 2-forms to 1-forms.

*f* ︎←→ *f* *dx* *dy* *dz*

It also takes 1-forms to 2-forms and 2-forms to 1-forms.

*f* *dx* + *g* *dy* + *h* *dz* ︎←→ *f* *dy* *dz* + *g* *dz* *dx* + *h* *dx* *dy*.

You can’t take the divergence of a gradient of a function *f*, but you can translate the 1-form *df *represents into the 2-form **df* via the Hodge operator, then take the divergence of that. This gives you a 3-form *d***df*, which you can translate to a 0-form by applying * once more to get **d***df*. So the Laplacian, defined to be the divergence of the gradient in vector calculus, is **d***d* in the language of differential forms.

Curl takes 1-forms to 2-forms, so you can’t take the curl of a curl. But you can turn a curl into a 1-form via the Hodge operator and then take the curl of that. And while you can’t take the divergence of a gradient, you can take the divergence of the Hodge operator applied to a gradient.

In vector calculus the Hodge operator is invisible. Making it visible explains why some combinations of operators always result in zeros and some do not: some identities follow from the general identity *d*² = 0, but operations requiring a Hodge operator are not zero in general.

The Hodge star operator is not so simple in general as it is in Euclidean space. On a Riemann manifold the Hodge operator is defined in terms of the metric. Defining the Laplace operator as **d***df* extends to Riemann manifolds, but defining it as the sum of second partial derivatives will not.

Plain text chart:

|------+------+------+-----| | | grad | curl | div | |------+------+------+-----| | grad | NA | NA | ? | | curl | 0 | ? | NA | | div | ? | 0 | NA | |------+------+------+-----|The post {div, grad, curl} of a {div, grad, curl} first appeared on John D. Cook.]]>

I discovered this when reading the documentation on Perl regular expressions, perlre. Here’s the excerpt from that page that caught my eye.

Many scripts have their own sets of digits equivalent to the Western

`0`

through`9`

ones.A few, such as Arabic, have more than one set. For a string to be considered a script run, all digits in it must come from the same set of ten, as determined by the first digit encountered.

Emphasis added.

I took some code I’d written for previous posts on Unicode numbers and modified it to search the range of Arabic Unicode characters and report all characters that represent 0 through 9.

from unicodedata import numeric, name a = set(range(0x00600, 0x006FF+1)) | \ set(range(0x00750, 0x0077F+1)) | \ set(range(0x008A0, 0x008FF+1)) | \ set(range(0x00870, 0x0089F+1)) | \ set(range(0x0FB50, 0x0FDFF+1)) | \ set(range(0x0FE70, 0x0FEFF+1)) | \ set(range(0x10EC0, 0x10EFF+1)) | \ set(range(0x1EE00, 0x1EEFF+1)) | \ set(range(0x1EC70, 0x1ECBF+1)) | \ set(range(0x1ED00, 0x1ED4F+1)) | \ set(range(0x10E60, 0x10E7F+1)) f = open('digits.txt','w',encoding='utf8') def uni(i): return "U+" + format(i, "X") for i in sorted(a): ch = chr(i) if ch.isnumeric() and numeric(ch) in range(10): print(ch, uni(i), numeric(ch), name(ch), file=f)

Apparently there are two ways to write 0, eight ways to write 2, and seven ways to write 1, 3, 4, 5, 6, 7, 8, and 9. I’ll include the full results at the bottom of the post.

I first wrote my Python script to write to the command line and redirected the output to a file. This resulted in some of the Arabic characters being replaced with a blank or with 0. Then I changed the script as above to write to a file opened to receive UTF-8 text. All the characters were preserved, though I can’t see most of them because the font my editor is using doesn’t have glyphs for the characters outside the BMP (i.e. those with Unicode values above 0xFFFF).

٠ U+660 0.0 ARABIC-INDIC DIGIT ZERO ١ U+661 1.0 ARABIC-INDIC DIGIT ONE ٢ U+662 2.0 ARABIC-INDIC DIGIT TWO ٣ U+663 3.0 ARABIC-INDIC DIGIT THREE ٤ U+664 4.0 ARABIC-INDIC DIGIT FOUR ٥ U+665 5.0 ARABIC-INDIC DIGIT FIVE ٦ U+666 6.0 ARABIC-INDIC DIGIT SIX ٧ U+667 7.0 ARABIC-INDIC DIGIT SEVEN ٨ U+668 8.0 ARABIC-INDIC DIGIT EIGHT ٩ U+669 9.0 ARABIC-INDIC DIGIT NINE ۰ U+6F0 0.0 EXTENDED ARABIC-INDIC DIGIT ZERO ۱ U+6F1 1.0 EXTENDED ARABIC-INDIC DIGIT ONE ۲ U+6F2 2.0 EXTENDED ARABIC-INDIC DIGIT TWO ۳ U+6F3 3.0 EXTENDED ARABIC-INDIC DIGIT THREE ۴ U+6F4 4.0 EXTENDED ARABIC-INDIC DIGIT FOUR ۵ U+6F5 5.0 EXTENDED ARABIC-INDIC DIGIT FIVE ۶ U+6F6 6.0 EXTENDED ARABIC-INDIC DIGIT SIX ۷ U+6F7 7.0 EXTENDED ARABIC-INDIC DIGIT SEVEN ۸ U+6F8 8.0 EXTENDED ARABIC-INDIC DIGIT EIGHT ۹ U+6F9 9.0 EXTENDED ARABIC-INDIC DIGIT NINE U+10E60 1.0 RUMI DIGIT ONE U+10E61 2.0 RUMI DIGIT TWO U+10E62 3.0 RUMI DIGIT THREE U+10E63 4.0 RUMI DIGIT FOUR U+10E64 5.0 RUMI DIGIT FIVE U+10E65 6.0 RUMI DIGIT SIX U+10E66 7.0 RUMI DIGIT SEVEN U+10E67 8.0 RUMI DIGIT EIGHT U+10E68 9.0 RUMI DIGIT NINE U+1EC71 1.0 INDIC SIYAQ NUMBER ONE U+1EC72 2.0 INDIC SIYAQ NUMBER TWO U+1EC73 3.0 INDIC SIYAQ NUMBER THREE U+1EC74 4.0 INDIC SIYAQ NUMBER FOUR U+1EC75 5.0 INDIC SIYAQ NUMBER FIVE U+1EC76 6.0 INDIC SIYAQ NUMBER SIX U+1EC77 7.0 INDIC SIYAQ NUMBER SEVEN U+1EC78 8.0 INDIC SIYAQ NUMBER EIGHT U+1EC79 9.0 INDIC SIYAQ NUMBER NINE U+1ECA3 1.0 INDIC SIYAQ NUMBER PREFIXED ONE U+1ECA4 2.0 INDIC SIYAQ NUMBER PREFIXED TWO U+1ECA5 3.0 INDIC SIYAQ NUMBER PREFIXED THREE U+1ECA6 4.0 INDIC SIYAQ NUMBER PREFIXED FOUR U+1ECA7 5.0 INDIC SIYAQ NUMBER PREFIXED FIVE U+1ECA8 6.0 INDIC SIYAQ NUMBER PREFIXED SIX U+1ECA9 7.0 INDIC SIYAQ NUMBER PREFIXED SEVEN U+1ECAA 8.0 INDIC SIYAQ NUMBER PREFIXED EIGHT U+1ECAB 9.0 INDIC SIYAQ NUMBER PREFIXED NINE U+1ECB1 1.0 INDIC SIYAQ NUMBER ALTERNATE ONE U+1ECB2 2.0 INDIC SIYAQ NUMBER ALTERNATE TWO U+1ED01 1.0 OTTOMAN SIYAQ NUMBER ONE U+1ED02 2.0 OTTOMAN SIYAQ NUMBER TWO U+1ED03 3.0 OTTOMAN SIYAQ NUMBER THREE U+1ED04 4.0 OTTOMAN SIYAQ NUMBER FOUR U+1ED05 5.0 OTTOMAN SIYAQ NUMBER FIVE U+1ED06 6.0 OTTOMAN SIYAQ NUMBER SIX U+1ED07 7.0 OTTOMAN SIYAQ NUMBER SEVEN U+1ED08 8.0 OTTOMAN SIYAQ NUMBER EIGHT U+1ED09 9.0 OTTOMAN SIYAQ NUMBER NINE U+1ED2F 2.0 OTTOMAN SIYAQ ALTERNATE NUMBER TWO U+1ED30 3.0 OTTOMAN SIYAQ ALTERNATE NUMBER THREE U+1ED31 4.0 OTTOMAN SIYAQ ALTERNATE NUMBER FOUR U+1ED32 5.0 OTTOMAN SIYAQ ALTERNATE NUMBER FIVE U+1ED33 6.0 OTTOMAN SIYAQ ALTERNATE NUMBER SIX U+1ED34 7.0 OTTOMAN SIYAQ ALTERNATE NUMBER SEVEN U+1ED35 8.0 OTTOMAN SIYAQ ALTERNATE NUMBER EIGHT U+1ED36 9.0 OTTOMAN SIYAQ ALTERNATE NUMBER NINEThe post Arabic numerals and numerals that are Arabic first appeared on John D. Cook.]]>

“It is faster to make a four-inch mirror and then a six-inch mirror than to make a six-inch mirror.” — Bill McKeenan, Thompson’s law of telescopes

If your goal is to make a six-inch mirror, why make a four-inch mirror first? From a reductionist perspective this makes no sense. But when you take into account how people learn, it makes perfect sense. The bigger project is more likely to succeed after you learn more about mirror-making in the context of a smaller project.

I was thrilled to discover the awk programming language in college. Munging files with little awk scripts was at least ten times easier than writing C programs.

When I told a friend about awk, he said “Have you seen Perl? It’ll do everything awk does and a lot more.”

If you want to learn Perl, I expect it would be faster to learn awk and then Perl than to learn Perl. I think I would have been intimidated by Perl if I’d tried to learn it first. But thinking of Perl as a more powerful awk made me more willing to try it. Awk make my life easier, and Perl had the potential to make it even easier. I’m not sure whether learning Perl was a good idea—that’s a discussion for another time—but I did.

I also learned C before learning C++. That was beneficial for similar reasons, starting with the four-inch mirror version of C++ before going on to the six-inch version.

Many people have said that learning C before C++ is a bad idea, that it teaches bad habits, and that it would be better to learn (modern) C++ from the beginning. That depends on what the realistic alternative is. Maybe if you attempted to learn C++ first you’d be intimidated and give up. As with giving up on learning Perl, giving up on learning C++ might be a good idea. At the time, however, learning C++ was a good move. Knowing C++ served me well when I left academia.

Teaching yourself something requires different tactics than learning something in a classroom. The four-inch mirror warmup is more important when you’re learning on your own.

If I were teaching a course on C++, I would not teach C first. The added structure of a classroom makes it easier to learn C++ directly. The instructor can pace students through the material so as to avoid the intimidation they might face if they were attempting to learn C++ alone. Students don’t become overwhelmed and give up because they have the accountability of homework assignments etc. Of course *some* students will give up, but more would give up without the structure of a class.

From a strictly logical perspective, it’s most efficient to learn the most abstract version of a theorem first. But this is bad pedagogy. The people who are excited about the efficiency of compressing math this way, e.g. Bourbaki, learned what they know more concretely and incrementally, and think in hindsight that the process could be shortened.

It does save time to present things at some level of generality. However, the number of steps you can go up the abstraction ladder at a time varies by person. Some people might need to go one rung at a time, some could go two at a time or maybe three, but everyone has a limit. And you can take bigger steps when you have a teacher, or even better a *tutor*, to guide you and to rescue you if you try to take too big of a step.

You typically understand something better, and are more able to apply it, when you learn it bottom-up. People think they can specialize more easily than they can generalize, but the opposite is usually true. It’s easier to generalize from a few specific examples than to realize that a particular problem is an instance of a general pattern.

I’ve noticed this personally, and I’ve noticed it in other people. On Twitter, for example, I sometimes post a general and a concrete version of a theorem, and the more concrete version gets more engagement. The response to a general theorem may be “Ho hum. Everybody knows that.” but the response to a particular application may be “Wow, I never thought of that!” even when the latter is a trivial consequence of the former.

You start with a triangle (solid blue) and add equilateral triangles (dashed green) on the **outside** of the triangle. When you connect the centroids of these triangles you get a (dotted red) equilateral triangle.

But Napoleon’s theorem is more general than this. It says you could also add the triangles to the **inside**. The result is much harder to parse visually. The following diagram flips each green triangle over.

You still get an equilateral triangle when you connect the centroids, but it’s a different triangle.

The post The messy version of Napoleon’s theorem first appeared on John D. Cook.]]>Start with any triangle and draw equilateral triangles on each side. Then connect the centroids of the added triangles. The resulting triangle will be equilateral.

In the diagram above, we start with the triangle with solid blue lines, then add the equilateral triangles with dashed green lines. The centroids of the green triangles are marked with red dots. The red triangle is equilateral.

See the next post for another version of Napoleon’s theorem.

The post Napoleon’s theorem first appeared on John D. Cook.]]>Barycentric coordinates come up often in applications, such as when working with finite element meshes. Trilinear coordinates are less common, at least in my experience, and yet trilinear coordinates simplify a lot of classical geometry problems.

To find the trilinear coordinates of a point relative to a circle, calculate the distance to each side of the triangle. This triple of numbers is one possible set of trilinear coordinates. So is any multiple of these three points. Trilinear coordinates are homogeneous, sorta like projective coordinates, and are all proportional triples represent the same point.

Here’s how trilinear coordinates relate to recent posts.

My post on the nine point circle mentions the orthocenter. The trilinear coordinates of the orthocenter are

sec *A* : sec *B* : sec *C*.

where *A*, *B*, and *C* are the three angles of the triangle. The center of the nine point circle has trilinear coordinates

cos(*B* − *C*) : cos(*C* − *A*) : cos(*A* − *B*).

The post on incircles and excircles implicitly uses trilinear coodinates. Since the incircle is tangent to all three sides of a triangle, it’s distance to each side is the radius of the circle, and so the incircle’s center has coordinates

1 : 1 : 1.

The three excircles have centers with coordinates

-1 : 1 : 1

1 : -1 : 1

1 : 1 : -1

A point with positive coordinates is inside the triangle and a point with one or two negative coordinates lies outside the triangle. More specifically, if the first coordinate is negative, the point lies outside the triangle and inside the sector determined by angle *A*, and similarly for the other coordinates. A point cannot have all three trilinear coordinates negative: if a point lies inside the sector of each angle of the triangle, then it’s inside the triangle.

The vertices of the equalateral triangle described by Morley’s theorem have coordinates

1 : 2 cos(*C*/3) : 2 cos(*B*/3)

2 cos(*C*/3) : 1 : 2 cos(*A*/3)

2 cos(*B*/3) : 2 cos(*A*/3) : 1

This theorem is surprising because out of a triangle with no symmetry pops a triangle with three-fold symmetry.

The theorem is also historically surprising. It’s a theorem of Euclidean geometry discovered around 1900, twenty three centuries after Euclid. You might reasonably suppose that Euclidean geometry had been thoroughly picked over by 1900, and yet Morley found something nobody else had noticed.

**Update**: See the next post for Napoleon’s theorem, another theorem where an equilateral triangle is associated with a general triangle.

I thought mpmath might have the functions I wanted, and indeed it does.

Function names are slightly different between mpmath and SciPy. For example, the `ellipe`

function in mpmath is overloaded to compute the complete elliptic integral of the first kind if given one argument and the incomplete counterpart if given two arguments.

The mpmath library works with arbitrary precision by default, and returns its own numeric types. But you can prefix a function with `fp.`

in order to get a Python floating point value back. For example,

>>> import mpmath as mp >>> mp.ellipe(1+1j, 1) mpc(real='1.2984575814159773', imag='0.63496391478473613') >>> mp.fp.ellipe(1+1j, 1) (1.2984575814159773+0.634963914784736j)The post Elliptic functions of a complex argument in Python first appeared on John D. Cook.]]>

Some tasks are easier to do in a square and others in a disk, so it’s clearly useful to be able to conformally map between squares and disks. The Riemann mapping theorem tells us this *can* be done, but it doesn’t tell us how. Two gentlemen figured out how to map between squares (and more general polygons) and disks in the 1860s: Hermann Schwarz and Elwin Christoffel. Schwarz is known for many different results in analysis, including the topic of the previous post, the conformal map from an ellipse to the unit disk. Christoffel is best known for Christoffel symbols, building blocks of tensors.

Here’s a plot showing how the Schwarz-Christoffel transformation from the square [-1, 1] × [-1, 1] to the unit disk transforms Cartesian grid lines.

Here’s another plot, this one showing how the grid lines for polar coordinates on the disk pull back to curves on the square.

The equation for the function from the square to the disk is

where sd is a Jacobi elliptic function with parameter 1/2 [1]. The constant *K* is the complete elliptic function of the first kind, evaluated at 1/2. In symbols, *K* = *K*(1/2).

The inverse function has equation

Here *F* is the incomplete elliptic function of the first kind. For more background, see this post on kinds of elliptic integrals.

Charles Sanders Peirce used the conformal map of the disk to the square to create the “Peirce quincuncial projection” map. This is a conformal (i.e. angle-preserving) map that represents the globe on a square. The diamond shape in the middle is the image of the equator. The mapping is singular at the south pole.

Peirce named the map after the quincunx pattern of the poles. This obscure word refers to the pattern of dots on the five face of a standard six-sided die.

- Conformal map from ellipse to disk
- Conformal mapping and Laplace’s equation
- Applied complex analysis

[1] “Parameter” is being used in a technical sense here. There are two conventions for describing the parameterization of elliptic functions and elliptic functions and elliptic integrals, and here we are using the parameter called The Parameter, commonly denoted *m*. There’s another convention that uses the elliptic modulus *k*, and the connection between them is that *m* = *k*².

Given an ellipse centered at the origin with semi-major axis *a* and semi-minor axis *b*. Will will assume without loss of generality that *a*² – *b*² = 1 and so the foci are at ±1.

Hermann Schwarz published the conformal map from the ellipse to the unit disk in 1869 [1, 2].

The map is given by

where sn is the Jacobi elliptic function with parameter *k*². The constants *k* and *K* are given by

where θ_{2} and θ_{3} are theta constants, the value so the theta functions θ_{2}(*z*, *q*) and θ_{3}(*z*, *q*) at *z* = 1.

Conformal maps to the unit disk are unique up to rotation. The map above is the unique conformal map preserving orientation:

The inverse of this map is given by

The inverse of the sn function with parameter *m* can be written in terms of elliptic integrals.

where *F* is the incomplete elliptic integral of the first kind and *m* is the parameter of sn and the parameter of *F*.

I wanted to illustrate the conformal map using an ellipse with aspect ratio 1/2. To satisfy *a*² – *b*² = 1, I set *a* = 2/√3 and *b* = 1/√3. The plot at the top of the post was made using Mathematica.

- NASA and conformal maps
- Comparing Jacobi functions and trig functions
- Conformal mapping and Laplace’s equation
- Numerically evaluate a theta function

[1] H. A. Schwarz, Über eigige Abbildungsaufgaben, Journal für di reine und angew. Matheamatik, vol 70 (1869), pp 105–120

[2] Gabor Szegö. Conformal Mapping of the Interior of an Ellipse onto a Circle. The American Mathematical Monthly, 1950, Vol. 57, No. 7, pp. 474–478

The post Conformal map of ellipse interior to a disk first appeared on John D. Cook.]]>This observation has been called the **piranha problem**. Predictors are compared to piranha fish. If you have a lot of big piranhas in a small pond, they start eating each other. If you have a lot of strong predictors, they predict each other.

In [1] the authors quantify the piranha effect several ways. I’ll just quote the first one here. See the paper for several other theorems and commentary on their implications.

If *X*_{1}, …, *X*_{p}, *y* are real-valued random variables with finite non-zero variance, then

So if the left side is large, either because *p* is large or because some of the correlations are large, then the right side is also large, and so the sum of the interaction terms is large.

[1]. The piranha problem: large effects swimming in a small pond. Available on arxiv.

The post Big correlations and big interactions first appeared on John D. Cook.]]>The **incircle** of a triangle is the largest circle that can fit inside the triangle. When we add the incircle to the illustration from the post on the nine-point circle, it’s kinda hard to see the difference between the two circles. The nine-point circle is drawn in solid black and the incircle is drawn in dashed green.

If we extend the sides of the triangle, an **excircle** is a circle tangent to one side the original triangle and to the extensions of the other two sides.

The post Incircle and excircles first appeared on John D. Cook.]]>

I tested this empirically and found the following stats. The numbers basically confirm what the host said.

The “double” column counts double letters at the end of a word, and the “all” column counts all words ending in the given letter, single or double.

|---+--------+-------+----| | | double | all | % | |---+--------+-------+----| | b | 8 | 476 | 2 | | c | 0 | 11324 | 0 | | d | 18 | 15996 | 0 | | f | 218 | 919 | 24 | | g | 11 | 6432 | 0 | | h | 1 | 4754 | 0 | | j | 0 | 17 | 0 | | k | 1 | 2650 | 0 | | l | 740 | 14929 | 5 | | m | 2 | 8881 | 0 | | n | 31 | 19966 | 0 | | p | 10 | 2201 | 0 | | q | 0 | 6 | 0 | | r | 35 | 15467 | 0 | | s | 9559 | 26062 | 37 | | t | 51 | 14831 | 0 | | v | 0 | 37 | 0 | | w | 0 | 702 | 0 | | x | 0 | 793 | 0 | | y | 0 | 27747 | 0 | | z | 22 | 143 | 15 | |---+--------+-------+----|

These stats simply count words; I suspect the results would be different if the words were weighted by frequency. For example, there are eight words that end in *bb*, but seven of these are rare words or alternate spellings: abb, bibb, dabb, dhabb, dubb, ebb, hubb, stubb.

Artemis is in a highly eccentric orbit around the moon, coming within 130 km (80 miles) of the moon’s surface at closest pass, and this orbit will take 14 days to complete. The weak link in this data is “14 days.” Surely this number has been rounded for public consumption.

If we assume Artemis is in a Keplerian orbit, i.e. we can ignore the effect of the Earth, then we can calculate the shape of the orbit using the information above. This assumption is questionable because as I understand it the reason for such an eccentric orbit has something to do with Lagrange points, which means the Earth’s gravity matters. Still, I image the effect of Earth’s gravity is a smaller source of error than the lack of accuracy in knowng the period.

Artemis is orbiting the moon similarly to how the Mars Orbiter Mission orbited Mars. We can use Kepler’s equation for period *T* to solve for the semi-major axis *a* of the orbit.

*T* = 2π √(*a*³/μ)

Here μ = *GM*, with *G* being the gravitational constant and *M* being the mass of the moon. Now

*G* = 6.674 × 10^{-11} N m²/kg²

and

*M* = 7.3459 × 10^{22} kg.

If we assume *T* is 14 × 24 × 3600 seconds, then we get

*a* = 56,640 km

or 35,200 miles. The value of *a* is rough since the value of *T* is rough.

Assuming a Keplerian orbit, the moon is at one focus of the orbit, located a distance *c* from the center of the ellipse. If Artemis is 130 km from the surface of the moon at perilune, and the radius of the moon is 1737 km, then

*c* = *a* – (130 + 1737) km = 54,770 km

or 34,000 miles. The semi-minor axis *b* satisfies

*b*² = *a*² – *c*²

and so

*b* = 14,422 km

or 8962 miles.

The eccentricity is *c*/*a* = 0.967. As I’ve written about before, eccentricity is hard to interpret intuitively. Aspect ratio is much easier to imaging than eccentricity, and the relation between the two is highly nonlinear.

Assuming everything above, here’s what the orbit would look like. The distances on the axes are in kilometers.

The orbit is highly eccentric: the center of the orbit is far from the foci of the orbit. But the aspect ratio is about 1/4. The orbit is only about 4 times wider in one direction than the other. It’s obviously an ellipse, but it’s not an extremely thin ellipse.

In an earlier post I showed how to compute the Lagrange points for the Sun-Earth system. We can use the same equations for the Earth-Moon system.

The equations for the distance *r* from the Lagrange points L1 and L2 to the moon are

The equation for L1 corresponds to taking ± as – and the equation for L2 corresponds to taking ± as +. Here *M*_{1} and *M*_{2} are the masses of the Earth and Moon respectively, and *R* is the distance between the two bodies.

If we modify the code from the earlier post on Lagrange points we get

L1 = 54784 km

L2 = 60917 km

where L1 is on the near side of the moon and L2 on the far side. We estimated the semi-major axis *a* to be 56,640 km. This is about 3% larger than the distance from the moon to L1. So the orbit of Artemis passes near or through L1. This assumes the axis of the Artemis orbit is aligned with a line from the moon to Earth, which I believe is at least approximately correct.

- The midpoints of each side.
- The foot of the altitude to each side.
- The midpoint between each vertex and the orthocenter.

The orthocenter is the place where the three altitudes intersect.

In the image above, the midpoints are red circles, the altitudes are blue lines, the feet are blue stars, and the midpoints between the vertices and the orthocenter are green squares.

The post The nine-point circle theorem first appeared on John D. Cook.]]>

A Möbius transformation is a function *f* : ℂ → ℂ of the form

*f*(*z*) = (*az* + *b*)/(*cz* + *d*)

where *ad* – *bc* ≠ 0. One of the basic properties of Möbius transformations is that they form a group. Except that’s not quite right if you want to be completely rigorous.

The problem is that a Möbius transformation isn’t a map from (all of) ℂ to ℂ unless *c* = 0 (which implies *d* cannot be 0). The usual way to fix this is to add a point at infinity, which makes things much simpler. Now we can say that the Möbius transformations form a group of automorphisms on the Riemann sphere *S*².

But if you insist on working in the finite complex plane, i.e. the complex plane ℂ with no point at infinity added, each Möbius transformations is actually a *partial function* on ℂ because a point may be missing from the domain. As detailed in [1], you technically do not have a group but rather an inverse monoid. (See the previous post on using inverse semigroups to think about floating point partial functions.)

You can make Möbius transformations into a group by *defining* the product of the Möbius transformation *f* above with

*g*(*z*) = (*Az* + *B*) / (*Cz* + *D*)

to be

(*aAz* + *bCz* + *aB* + *bD*) / (*Acz* + *Cdz* + *Bc* + *dD*),

which is what you’d get if you computed the composition *f* ∘ *g* as functions, ignoring any difficulties with domains.

The Möbius inverse monoid is surprisingly complex. Things are simpler if you compactify the complex plane by adding a point at infinity, or if you gloss over the fine points of function domains.

- Transformations of Olympic rings
- Curiously simple approximations
- Solving for Möbius transformation coefficients

[1] Mark V. Lawson. The Möbius Inverse Monoid. Journal of Algebra. 200, 428–438 (1998).

The post The Möbius Inverse Monoid first appeared on John D. Cook.]]>Is the Python function

def f(x): return x + 2

invertible? Not always.

You might reasonably think the function

def g(x): return x - 2

is the inverse of `f`

, and it is for many values of x. But try this:

>>> x = 2**53 - 1.0 >>> g(f(x)) - x -1.0

The composition of `f`

and `g`

does not give us `x`

back because of the limited length of a floating point significand. See Anatomy of a floating point number.

The function `f`

as a function between floating point numbers is **locally invertible**. That is, it is invertible on a subset of its domain.

Now let’s look at the function

def f(x): return x*x

Is this function invertible? There is a function, namely `sqrt`

that serves as an inverse to `f`

for many values of `x`

, but not all `x`

. The function `sqrt`

is a **partial function** because although it is ostensibly a function on floating point numbers, it crashes for negative inputs. The function’s actual domain is smaller than its nominal domain.

Locally invertible functions are an inevitable part of programming, and are awkward to reason about. But there are tools that help. For example, **inverse semigroups**.

According to nLab

An inverse semigroup is a semigroup

Ssuch that for every elements∈S, there exists a unique “inverse”s* ∈Ssuch thats s*s=sands*s s*=s*.

The canonical example of an inverse semigroup, and in some sense the *only* example, is the following, also from nLab.

For any set *X*, let *I*(*X*) be the set of all partial bijections on *X*, i.e. bijections between subsets of *X*. The composite of partial bijections is their composite as relations (or as partial functions).

This is the only example in the sense that the Wagner-Preston theorem says every inverse semigroup is isomorphic to a group of this form.

In our case, the set *X* is the set of representable floating point numbers, and locally invertible functions are functions which *are* invertible, but only when restricted to a subset of *X*.

The upper half plane is a sort of secondary hub. You may want to map two regions to and from each other via a half plane. And as with the disk, there’s an explicit solution to Laplace’s equation on a half plane.

Another reason to be interested in Laplace’s equation on a half plane is the connection to the Hilbert transform and harmonic conjugates.

Given a continuous real-valued function *u* on the real line, *u* can be extended to a harmonic function on the upper half plane by taking the convolution of *u* with the Poisson kernel, a variation on the Poisson kernel from the previous post. That is, for *y* > 0,

This gives a solution to Laplace’s equation on the upper half plane with boundary values given by *u* on the real line. The function *u* is smooth on the upper half plane, and its limiting values as *y* → 0 is continuous.

Furthermore, *u* is the real part of an analytic function *f* = *u *+ *iv*. The function *v* is the **harmonic conjugate** of *u*, and also equals the Hilbert transform of *u*.

on a disk?

Laplace’s equation is important in its own right—for example, it’s important in electrostatics—and understanding Laplace’s equation is a stepping stone to understanding many other PDEs.

Why care specifically about a disk? An obvious reason is that you might need to solve Laplace’s equation on a disk! But there are two less obvious reasons.

First, a disk can be mapped conformally to any simply connected proper open subset of the complex plane. And because conformal equivalence is transitive, two regions conformally equivalent to the disk are conformally equivalent to each other. For example, as I wrote about here, you can map a Mickey Mouse silhouette

to and from the Batman logo

using conformal maps. In practice, you’d probably map Mickey Mouse to a disk, and compose that map with a map from the disk to Batman. The disk is a standard region, and so there are catalogs of conformal maps between the disk and other regions. And there are algorithms for computing maps between a standard region, such as the disk or half plane, and more general regions. You might be able to lookup a mapping from the disk to Mickey, but probably not to Batman.

In short, the disk is sort of the **hub** in a hub-and-spoke network of cataloged maps and algorithms.

Secondly, Laplace’s equation has an **analytical solution** on the disk. You can just write down the solution, and we will shortly. If it were easy to write down the solution on a triangle, that might be the hub, but instead its a disk.

Suppose *u* is a real-valued continuous function on the the boundary of the unit disk. Then *u* can be extended to a harmonic function, i.e. a solution to Laplace’s equation on the interior of the disk, via the Poisson integral formula:

Or in terms of polar coordinates:

I wrote a lot of posts on ellipses and related topics over the last couple months. Here’s a recap of the posts, organized into categories.

- Eccentricity, flattening, and aspect ratio
- Latus rectum
- Directrix
- Example of a highly elliptical orbit

- Pascal’s theorem
- Intersection of two conics
- Determining conic sections by points or tangents
- Evolute of an ellipse

Design of experiments is a branch of statistics, and design theory is a branch of combinatorics, and yet they overlap quite a bit.

It’s hard to say precisely what design theory is, but it’s consider with whether objects can be arranged in certain ways, and if so how many ways this can be done. Design theory is pure mathematics, but it is of interest to people working in ares of applied mathematics such as coding theory and statistics.

Here’s a recap of posts I’ve written recently related to design of experiments and design theory.

A few weeks ago I wrote about fractional factorial design. Then later I wrote about response surface models. Then a diagram from central composite design, a popular design in response surface methodology, was one the diagrams in a post I wrote about visually similar diagrams from separate areas of application.

I wrote two posts about pitfalls with A/B testing. One shows how play-the-winner sequential testing has the same problems as Condorcet’s voter paradox, with the order of the tests potentially determining the final winner. More seriously, A/B testing cannot detect interaction effects which may be critical.

There are several civilian and military standards related to design of experiments. The first of these was MIL-STD-105. The US military has retired this standard in favor of the civilian standard ASQ/ANSI Z1.4 which is virtually identical.

Similarly, the US military standard MIL-STD-414 was replaced by the very similar civilian standard ASQ/ANSI Z1.9. This post looks at the mean-range method for estimating variation which these two standards reference.

I wrote a couple posts on Room squares, one on Room squares in general and one on Thomas Room’s original design now known as a Room square. Room squares are used in tournament designs.

I wrote a couple posts about Costas arrays, an introduction and a post on creating Costas arrays in Mathematica.

Latin squares and Greco-Latin squares a part of design theory and a part of design of experiments. Here are several posts on Latin and Greco-Latin squares.

The post Design of experiments and design theory first appeared on John D. Cook.]]>A repunit prime is, unsurprisingly, a repunit number which is prime. The most obvious example is *R*_{2} = 11. Until recently the repunit numbers confirmed to be prime were *R*_{n} for n = 2, 19, 23, 317, 1031. Now the case for *n* = 49081 has been confirmed.

Here is the announcement. The date posted at the top of the page is from March this year, but I believe the announcement is new. Maybe the author edited an old page and didn’t update the shown date.

Incidentally, I noticed a lot of repunits when I wrote about bad passwords a few days ago. That post explored a list of commonly used but broken passwords. This is the list of passwords that password cracking software will try first. The numbers *R*_{n} are part of the list for the following values of *n*:

1–45, 47–49, 51, 53–54, 57–60, 62, 67, 70, 72, 77, 82, 84, 147

So 46 is the smallest value of *n* such that *R _{n}* is not on the list. I would not recommend using

The bad password file is sorted in terms of popularity, and you might expect repunits to appear in the file in order, i.e. shorter sequences first. That is sorta true overall. But you can see streaks in the plot below showing multiple runs where longer passwords are more common than shorter passwords.

The post Repunits: primes and passwords first appeared on John D. Cook.]]>How to solve trig equations in general, and specifically how to solve equations involving quadratic polynomials in sine and cosine.

This weekend I wrote about a change of variables to “depress” a cubic equation, eliminating the quadratic term. This is a key step in solving a cubic equation. The idea can be extended to higher degree polynomials, and applied to differential equations.

Before that I wrote about how to tell whether a cubic or quartic equation has a double root. That post is also an introduction to resultants.

First of all, there was a post on solving Kepler’s equation with Newton’s method, and especially with John Machin’s clever starting point.

Another post, also solving Kepler’s equation, showing how Newton’s method can be good, bad, or ugly.

And out there by itself, Weierstrass’ method for simultaneously searching for all roots of a polynomial.

The post Recent posts on solving equations first appeared on John D. Cook.]]>There are two kinds of experts, consulting experts and testifying experts. These names mean what they say: consulting experts consult with their clients, and testifying experts testify. Usually a lawyer will retain an expert with the intention of having this person testify, but the expert starts out as a de facto consulting expert.

Working with lawyers is quite pleasant. The division of labor is crystal clear: you are hired to be an expert on some topic, they are the experts in matters of law, and the streams don’t cross. You’re treated with deference and respect. Even if a lawyer knows something about your field of expertise, it’s not their role to opine on it.

I’ve never had a lawyer try to twist my arm. It’s not in their interests to do so. I’ve told lawyers things they were disappointed to hear, but I’ve never had a lawyer argue with me.

I’ve turned down engagements when it was immediately apparent that the client didn’t have a statistical case. (They may have a *legal* case, but that’s not my bailiwick.) Sometimes lawyers are grasping at straws, and they may try a statistical argument as a last resort.

One person approached me to do a statistical analysis of **one** data point. Not to be outdone, someone once asked me to do a statistical analysis based on absolutely **no** data. I told both that I’d need a little more data to go on.

John Tukey said that the best part of being a statistician is that you get to play in everyone else’s back yard. I’d expand that to applied math more generally. You can’t be expected to be an expert in everything, but you are expected to come up to speed quickly on the basics of problem domain.

Work on legal cases is confidential, but so is almost everything else I do. However, an intellectual property case I worked on took this to a higher level. I was only allowed to work at opposing counsel’s office, on their laptop, without an internet connection, and without a phone. That was an interesting exercise.

There’s a lot of hurry-up and wait with legal work. A project can be dormant and presumably dead, then suddenly pop back up. This isn’t unique to legal work, but it seems more frequent or more extreme with legal work.

Law firms do everything by the hour. I mostly work by the project, but I’ll work by the hour for lawyers. There are occasional exceptions, but hourly billing is firmly ingrained in legal culture. And reasonably so: it’s hard to say in advance how much work something will take. Sometimes when you *can* reasonably anticipate the scope of a task you can do it fixed bid.

Law firms typically pass through all expenses. So even if a firm hires you, their client is responsible for paying you. You don’t get paid until the law firm gets paid, which can sometimes take a while.

A few years ago I had to fly around a fair amount. That was fun for a while but it got old. I haven’t had to travel for work since the pandemic and I’m OK with that.

The post Expert witness experiences first appeared on John D. Cook.]]>A linear differential equation can be viewed as a polynomial in the differential operator *D* applied to the function we’re solving for. More on this idea here. So it makes sense that a technique analogous to the technique used for “depressing” a polynomial could work similarly for differential equations.

In the differential equation post mentioned above, we started with the equation

and reduced it to

using the change of variable

So where did this change of variables come from? How might we generalize it to higher-order differential equations?

In the post on depressing a polynomial, we started with a polynomial

and use the change of variables

to eliminate the *x*^{n-1} term. Let’s do something analogous for differential equations.

Let *P* be an *n*th degree polynomial and consider the differential equation

We can turn this into a differential

where the polynomial

has no term involving *D*^{n-1} by solving

which leads to

generalizing the result above for second order ODEs.

The post Eliminating terms from higher-order differential equations first appeared on John D. Cook.]]>We will use big-O notation *O*(*x*^{k}) to mean terms involving *x* to powers no higher than *k*. This is slightly unusual, because typically big-O notation is used when some variable is tending to a limit, and we’re not taking limits here.

Let’s start with an *n*th degree polynomial

Here *a* is not zero, or else we wouldn’t have an *n*th degree polynomial.

The following calculation shows that the change of variables

results in an *n*th degree polynomial in *t* with no term involving *x*^{n – 1}.

This approach works over real or complex numbers. It even works over finite fields too, if you can divide by *na*.

I’ve mentioned a couple times that the Weierstrass form of an elliptic curve

is the most general except when working over a field of characteristic 2 or 3. The technique above breaks down because 3*a* may not be invertible in a field of characteristic 2 or 3.

The previous post showed how to reduce a general cubic equation to one in the form

which is called a “depressed cubic.” In a nutshell, you divide by the leading coefficient then do a simple change of variables that removes the quadratic term.

Now what? This post will give a motivated but completely ahistorical approach for removing the linear term *cx*.

Suppose we don’t know how to solve cubic equations. What do we know how to solve? Quadratic equations. So a natural question to ask is how we might find a quadratic equation that has the same roots as our cubic equation. Well, how can you tell in general whether two polynomials have a common root? Resultants.

This is the point where we completely violate historical order. Tartaglia discovered a general solution to depressed cubic equations in the 16th century [1], but Sylvester introduced the resultant in the 19th century. Resultants were a great idea, but not a rabbit out of a hat. It’s not far fetched that some sort of determinant could tell you whether two polynomials have a common factor since this is analogous to two sets of vectors having overlapping spans. I found the idea of using resultants in this context in [2].

In 1683, Tschirnhaus published the transform that in modern terminology amounts to finding a polynomial *T*(*x*, *y*) that has zero resultant with a depressed cubic.

Tschirnhaus assumed his polynomial *T* has the form

Let’s take the resultant of our cubic and Tschirnhaus’ quadratic using Mathematica.

Resultant[x^3 + c x + d, x^2 + a x + 2 c/3 + y, x]

This gives us

which is a cubic equation in *y*. If the coefficient of *y* were zero, then we could solve the cubic equation for *y* by simply taking a cube root. But we can make that happen by our choice of *a*, i.e. we pick *a* to solve the quadratic equation

So we solve this equation for *a*, plug either root for *a* into the expression for the resultant, then solve for *y*. Then we take that value of *y* and find where Tschirnhaus’ polynomial is zero by solving the quadratic equation

We solved for a value of *y* that makes the resultant zero, so our original polynomial and Tschirnhaus’ polynomial have a common root. So one of the roots of the equation above is a root of our original cubic equation.

[1] In this blog post, we first reduced the general quadratic to the depressed form, then solved the depressed form. This isn’t the historical order. Tartaglia came up with a general solution to the depressed cubic equation, but was not able to solve equations containing a quadratic term.

[2] Victor Adamchik and David Jeffrey. Polynomial Transformations of Tschirnhaus, Bring and Jerrard. ACM SIGSAM Bulletin, Vol 37, No. 3, September 2003.

The post How to solve a cubic equation first appeared on John D. Cook.]]>