John D. Cook https://www.johndcook.com/blog Applied Mathematics Consulting Thu, 22 Oct 2020 14:52:29 +0000 en-US hourly 1 https://www.johndcook.com/blog/wp-content/uploads/2020/01/cropped-favicon_512-32x32.png John D. Cook https://www.johndcook.com/blog 32 32 More fun with quatrefoils https://www.johndcook.com/blog/2020/10/22/more-fun-with-quatrefoils/ https://www.johndcook.com/blog/2020/10/22/more-fun-with-quatrefoils/#comments Thu, 22 Oct 2020 12:00:51 +0000 https://www.johndcook.com/blog/?p=63584 In a comment to my previous post on quatrefoils, Jan Van lint suggested a different equation for quatrefoils: r = a + |cos(2θ)| Here are some examples of how these curves look for varying values of a. As a increases, the curves get rounder. We can quantify this by looking at the angle between the […]

The post More fun with quatrefoils first appeared on John D. Cook.

]]>
In a comment to my previous post on quatrefoils, Jan Van lint suggested a different equation for quatrefoils:

r = a + |cos(2θ)|

Here are some examples of how these curves look for varying values of a.

As a increases, the curves get rounder. We can quantify this by looking at the angle between the tangents on either side of the cusps. By symmetry, we can pick any one of the four cusps, so we’ll work with the one at θ = π/4 for convenience.

The slopes of the tangent lines are the left and right derivatives

\frac{dy}{dx} = \frac{r'(\theta)\sin\theta + r(\theta)\cos\theta}{r'(\theta)\cos\theta - r(\theta)\sin\theta}

Now the derivative of

a + |cos(2θ)|

with respect to θ at θ = π/4 is 2 from one size and -2 from the other.

Sine and cosine are equal at π/4, they cancel out in the ratio above and so the two derivatives, the slopes of the two tangent lines, are (2+a)/(2-a) and (2-a)/(2+a). The slopes are reciprocals of each other, which is what we’d expect since the quatrefoils are symmetric about the line θ = π/4.

The angles of the two tangent lines are the inverse tangents of the slopes, and so the angle between the two tangent lines is

Note that as a goes to zero, so does the angle between the tangent lines.

Here’s a plot of the angle as a function of a.

You could start with a desired angle and solve the equation above numerically for the value of a that gives the angle. From the graph above, it looks like if we wanted the curves to intersect at 90° we should pick a around 2. In fact, we should pick a exactly equal to 2. There the slopes are (2+2)/(2-2) = ∞ and (2-2)/(2+2) = 0, i.e. one tangent line is perfectly vertical and the other is perfectly horizontal.

The post More fun with quatrefoils first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/10/22/more-fun-with-quatrefoils/feed/ 2
The word problem https://www.johndcook.com/blog/2020/10/19/the-word-problem/ https://www.johndcook.com/blog/2020/10/19/the-word-problem/#comments Tue, 20 Oct 2020 00:19:04 +0000 https://www.johndcook.com/blog/?p=63409 Most people have heard of word problems, but not as many have heard of the word problem. If you’re imagining that the word problem is some superlatively awful word problem, I can assure you it’s not. It’s both simpler and weirder than that. The word problem is essentially about whether you can always apply algebraic […]

The post The word problem first appeared on John D. Cook.

]]>
Most people have heard of word problems, but not as many have heard of the word problem. If you’re imagining that the word problem is some superlatively awful word problem, I can assure you it’s not. It’s both simpler and weirder than that.

The word problem is essentially about whether you can always apply algebraic rules in an automated way. The reason it is called the word problem is that you start by a description of your algebraic system in terms of symbols (“letters”) and concatenations of symbols (“words”) subject to certain rules, also called relations.

The word problem for groups

For example, you can describe a group by saying it contains a and b, and it satisfies the relations

a² = b²

and

a-1ba = b-1.

A couple things are implicit here. We’ve said this a group, and since every element in a group has an inverse, we’ve implied that a-1 and b-1 are in the group as well. Also from the definition of a group comes the assumption that multiplication is associative, that there’s an identity element, and that inverses work like they’re supposed to.

In the example above, you could derive everything about the group from the information given. In particular, someone could give you two words—strings made up of a, b, a-1, and b-1—and you could determine whether they are equal by applying the rules. But in general, this is not possible for groups.

Undecidable

The bad news is that in general this isn’t possible. In computer science terminology, the word problem is undecidable. There is no algorithm that can tell whether two words are equal given a list of relations, at least not in general. There are special cases where the word problem is solvable, but a general algorithm is not possible.

The word problem for semigroups

I presented the word problem above in the context of groups, but you could look at the word problem in more general contexts, such as semigroups. A semigroup is closed under some associative binary operation, and that’s it. There need not be any inverses or even an identity element.

Here’s a concrete example of a semigroup whose word problem has been proven to be undecidable. As before we have two symbols, a and b. And because we are in a semigroup, not a group, there are no inverses. Our semigroup consists of all finite sequences make out of a‘s and b‘s, subject to these five relations.

aba2b2 = b2a2ba

a2bab2a = b2a3ba

aba3b2 = ab2aba2

b3a2b2a2ba = b3a2b2a4

a4b2a2ba = b2a4

Source: Term Rewriting and All That

Experience

When I first saw groups presented this as symbols and relations, I got my hopes up that a large swath of group theory could be automated. A few minutes later my naive hopes were dashed. So in my mind I thought “Well, then this is hopeless.”

But that is not true. Sometimes the word problem is solvable. It’s like many other impossibility theorems. There’s no fifth degree analog of the quadratic equation in general, but there are fifth degree polynomials whose roots can be found in closed form. There’s no program that can tell whether any arbitrary program will halt, but that doesn’t mean you can’t tell whether some programs halt.

It didn’t occur to me at the time that it would be worthwhile to explore the boundaries, learning which word problems can or cannot be solved. It also didn’t occur to me that I would run into things like the word problem in practical applications, such as simplifying symbolic expressions and optimizing their evaluation. Undecidable problems lurk everywhere, but you can often step around them.

The post The word problem first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/10/19/the-word-problem/feed/ 3
Real-time analytics https://www.johndcook.com/blog/2020/10/19/real-time-analytics/ https://www.johndcook.com/blog/2020/10/19/real-time-analytics/#comments Mon, 19 Oct 2020 13:50:18 +0000 https://www.johndcook.com/blog/?p=63388 There’s an ancient saying “Whom the gods would destroy they first make mad.” (Mad as in crazy, not mad as in angry.) I wrote a variation of this on Twitter: Whom the gods would destroy, they first give real-time analytics. Having more up-to-date information is only valuable up to a point. Past that point, you’re […]

The post Real-time analytics first appeared on John D. Cook.

]]>
There’s an ancient saying “Whom the gods would destroy they first make mad.” (Mad as in crazy, not mad as in angry.) I wrote a variation of this on Twitter:

Whom the gods would destroy, they first give real-time analytics.

Having more up-to-date information is only valuable up to a point. Past that point, you’re more likely to be distracted by noise. The closer you look at anything, the more irregularities you see, and the more likely you are to over-steer [1].

I don’t mean to imply that the noise isn’t real. (More on that here.) But there’s a temptation to pay more attention to the small variations you don’t understand than the larger trends you believe you do understand.

I became aware of this effect when simulating Bayesian clinical trial designs. The more often you check your stopping rule, the more often you will stop [2]. You want to monitor a trial often enough to shut it down, or at least pause it, if things change for the worse. But monitoring too often can cause you to stop when you don’t want to.

Flatter than glass

A long time ago I wrote about the graph below.

The graph looks awfully jagged, until you look at the vertical scale. The curve represents the numerical difference between two functions that are exactly equal in theory. As I explain in that post, the curve is literally smoother than glass, and certainly flatter than a pancake.

Notes

[1] See The Logic of Failure for a discussion of how over-steering is a common factor in disasters such as the Chernobyl nuclear failure.

[2] Bayesians are loathe to talk about things like α-spending, but when you’re looking at stopping frequencies, frequentist phenomena pop up.

The post Real-time analytics first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/10/19/real-time-analytics/feed/ 6
Naive modeling https://www.johndcook.com/blog/2020/10/18/naive-modeling/ https://www.johndcook.com/blog/2020/10/18/naive-modeling/#comments Sun, 18 Oct 2020 23:44:16 +0000 https://www.johndcook.com/blog/?p=63362 In his book The Algorithm Design Manual, Steven Skiena has several sections called “War Stories” where he talks about his experience designing algorithms for clients. Here’s an excerpt of a story about finding the best airline ticket prices. “Look,” I said at the start of the first meeting. “This can’t be so hard. Consider a […]

The post Naive modeling first appeared on John D. Cook.

]]>
Travel agent

In his book The Algorithm Design Manual, Steven Skiena has several sections called “War Stories” where he talks about his experience designing algorithms for clients.

Here’s an excerpt of a story about finding the best airline ticket prices.

“Look,” I said at the start of the first meeting. “This can’t be so hard. Consider a graph … The path/fare can be found with Dijkstra’s shorted path algorithm. Problem solved!” I announced waving my hand with a flourish.

The assembled cast of the meeting nodded thoughtfully, then burst out laughing.

Skiena had greatly underestimated the complexity of the problem, but he learned, and was able to deliver a useful solution.

This reminds me of a story about a calculus professor who wrote a letter to a company that sold canned food explaining how they could use less metal for the same volume by changing the dimensions of their can. Someone wrote back thanking him for his suggestion listing reasons why the optimization problem was far more complicated than he had imagined. If anybody has a link to that story, please let me know.

Related post: Bring out your equations!

The post Naive modeling first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/10/18/naive-modeling/feed/ 6
Opening Windows files from bash and eshell https://www.johndcook.com/blog/2020/10/17/windows-files-from-eshell/ https://www.johndcook.com/blog/2020/10/17/windows-files-from-eshell/#comments Sat, 17 Oct 2020 15:06:10 +0000 https://www.johndcook.com/blog/?p=63265 I often work in a sort of amphibious environment, using Unix software on Windows. As you can well imagine, this causes headaches. But I’ve found such headaches are generally more manageable than the headaches from alternatives I’ve tried. On the Windows command line, you can type the name of a file and Windows will open […]

The post Opening Windows files from bash and eshell first appeared on John D. Cook.

]]>
I often work in a sort of amphibious environment, using Unix software on Windows. As you can well imagine, this causes headaches. But I’ve found such headaches are generally more manageable than the headaches from alternatives I’ve tried.

On the Windows command line, you can type the name of a file and Windows will open the file with the default application associated with its file extension. For example, typing foo.docx and pressing Enter will open the file by that name using Microsoft Word, assuming that is your default application for .docx files.

Unix shells don’t work that way. The first thing you type at the command prompt must be a command, and foo.docx is not a command. The Windows command line generally works this way too, but it makes an exception for files with recognized extensions; the command is inferred from the extension and the file name is an argument to that command.

WSL bash

When you’re running bash on Windows, via WSL (Windows Subsystem for Linux), you can run the Windows utility start which will open a file according to its extension. For example,

    cmd.exe /C start foo.pdf

will open the file foo.pdf with your default PDF viewer.

You can also use start to launch applications without opening a particular file. For example, you could launch Word from bash with

    cmd.exe /C start winword.exe

Emacs eshell

Eshell is a shell written in Emacs Lisp. If you’re running Windows and you do not have access to WSL but you do have Emacs, you can run eshell inside Emacs for a Unix-like environment.

If you try running

    start foo.pdf

that will probably not work because eshell does not use the windows PATH environment.

I got around this by creating a Windows batch file named mystart.bat and put it in my path. The batch file simply calls start with its argument:

    start %

Now I can open foo.pdf from eshell with

    mystart foo.pdf

The solution above for bash

    cmd.exe /C start foo.pdf

also works from eshell.

(I just realized I said two contradictory things: that eshell does not use your path, and that it found a bash file in my path. I don’t know why the latter works. I keep my batch files in c:/bin, which is a Unix-like location, and maybe eshell looks there, not because it’s in my Windows path, but because it’s in what it would expect to be my path based on Unix conventions. I’ve searched the eshell documentation, and I don’t see how to tell what it uses for a path.)

More shell posts

The post Opening Windows files from bash and eshell first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/10/17/windows-files-from-eshell/feed/ 4
Generating all primitive Pythagorean triples with linear algebra https://www.johndcook.com/blog/2020/10/16/primitive-pythagorean-triples/ https://www.johndcook.com/blog/2020/10/16/primitive-pythagorean-triples/#comments Fri, 16 Oct 2020 13:38:46 +0000 https://www.johndcook.com/blog/?p=63187 A Pythagorean triple is a set of positive integers that can be the lengths of sides of a right triangle, i.e. numbers a, b, and c such that a² + b² = c². A primitive Pythagorean triple (PPT) is a Pythagorean triple whose elements are relatively prime. For example, (50, 120, 130) is a Pythagorean […]

The post Generating all primitive Pythagorean triples with linear algebra first appeared on John D. Cook.

]]>
A Pythagorean triple is a set of positive integers that can be the lengths of sides of a right triangle, i.e. numbers a, b, and c such that

a² + b² = c².

A primitive Pythagorean triple (PPT) is a Pythagorean triple whose elements are relatively prime. For example, (50, 120, 130) is a Pythagorean triple, but it’s not primitive because all the entries are divisible by 10. But (5, 12, 13) is a primitive Pythagorean triple.

A method of generating all PPTs has been known since the time of Euclid, but I recently ran across a different approach to generating all PPTs [1].

Let’s standardize things a little by assuming our triples have the form (a, b, c) where a is odd, b is even, and c is the hypotenuse [2]. In every PPT one of the sides is even and one is odd, so we will assume the odd side is listed first.

It turns out that all PPTs can be found by multiplying the column vector [3, 4, 5] repeatedly by matrices M0, M1, or M2. In [1], Romik uses the sequence of matrix multiplications needed to create a PPT as trinary number associated with the PPT.

The three matrices are given as follows.

\begin{align*} M_0 &= \begin{bmatrix} \phantom{-}1 & \phantom{-}2 & \phantom{-}2 \\ \phantom{-}2 & \phantom{-}1 & \phantom{-}2 \\ \phantom{-}2 & \phantom{-}2 & \phantom{-}3 \end{bmatrix} \\ M_1 &= \begin{bmatrix} -1 & \phantom{-}2 & \phantom{-}2 \\ -2 & \phantom{-}1 & \phantom{-}2 \\ -2 & \phantom{-}2 & \phantom{-}3 \end{bmatrix} \\ M_2 &= \begin{bmatrix} \phantom{-}1 & -2 & \phantom{-}2 \\ \phantom{-}2 & -1 & \phantom{-}2 \\ \phantom{-}2 & -2 & \phantom{-}3 \end{bmatrix} \end{align*}

Note that all three matrices have the same entries, though with different signs. If you number the columns starting at 1 (as mathematicians commonly do and computer scientists may not) then Mk is the matrix whose kth column is negative. There is no 0th column, so M0 is the matrix with no negative columns. The numbering I’ve used here differs from that used in [1].

For example, the primitive Pythagorean triple [5, 12, 13] is formed by multiplying [3, 4, 5] on the left by M2. The PPT [117, 44, 125] is formed by multipling [3, 4, 5] by
M1 M1 M2.

\begin{bmatrix} 117 \\ 44 \\ 125 \end{bmatrix} = M_1 M_1 M_2 \begin{bmatrix} 3 \\ 4 \\ 5 \end{bmatrix}

More Pythagorean posts

[1] The dynamics of Pythagorean triples by Dan Romik

[2] Either a is odd and b is even or vice versa, so we let a be the odd one.

If a and b were both even, c would be even, and the triple would not be primitive. If a and b were both odd, c² would be divisible by 2 but not by 4, and so it couldn’t be a square.

The post Generating all primitive Pythagorean triples with linear algebra first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/10/16/primitive-pythagorean-triples/feed/ 1
Playing around with a rational rose https://www.johndcook.com/blog/2020/10/14/playing-around-with-a-rational-rose/ https://www.johndcook.com/blog/2020/10/14/playing-around-with-a-rational-rose/#comments Wed, 14 Oct 2020 14:55:45 +0000 https://www.johndcook.com/blog/?p=62993 A “rose” in mathematics is typically a curve with polar equation r = cos(kθ) where k is a positive integer. If k is odd, the resulting graph has k “petals” and if k is even, the plot has 2k petals. Sometimes the term rose is generalized to the case of non-integer k. This is the […]

The post Playing around with a rational rose first appeared on John D. Cook.

]]>
A “rose” in mathematics is typically a curve with polar equation

r = cos(kθ)

where k is a positive integer. If k is odd, the resulting graph has k “petals” and if k is even, the plot has 2k petals.

Sometimes the term rose is generalized to the case of non-integer k. This is the sense in which I’m using the phrase “rational rose.” I’m not referring to an awful piece of software by that name [1]. This post will look at a particular rose with k = 2/3.

My previous post looked at

r = cos(2θ/3)

and gave the plot below.

Plot of r = abs(cos(2 theta / 2) )

Unlike the case where k is an integer, the petals overlap.

In this post I’d like to look at two things:

  1. The curvature in the figure above, and
  2. Differences between polar plots in Python and Mathematica

Curvature

The graph above has radius 1 since cosine ranges from -1 to 1. The curve is made of arcs that are approximately circular, with the radius of these approximating circles being roughly 1/2, sometimes bigger and sometimes smaller. So we would expect the curvature to oscillate roughly around 2. (The curvature of a circle of radius r is 1/r.)

The curvature can be computed in Mathematica as follows.

    numerator = D[x[t], {t, 1}] D[y[t], {t, 2}] - 
                D[x[t], {t, 2}] D[y[t], {t, 1}]
    denominator = (D[x[t], t]^2 + D[y[t], t]^2)^(3/2)
    Simplify[numerator / denominator]

This produces

\kappa = \frac{3 \sqrt{2} \left(5 \cos \left(\dfrac{4 \theta}{3}\right)+21\right)}{\left(5 \cos \left(\dfrac{4 \theta}{3}\right)+13\right)^{3/2}}

A plot shows that the curvature does indeed oscillate roughly around 2.

The minimum curvature is 13/9, which the curve takes on at polar coordinate (1, 0), as well as at other points. That means that the curve starts out like a circle of radius 9/13 ≈ 0.7.

The maximum curvature is 3 and occurs at the origin. There the curve is approximately a circle of radius 1/3.

Matplotlib vs Mathematica

To make the plot we’ve been focusing on, I plotted

r = cos(2θ/3)

in Mathematica, but in matplotlib I had to plot

r = |cos(2θ/3)|.

In both cases, θ runs from 0 to 8π. To highlight the differences in the way the two applications make polar plots, let’s plot over 0 to 2π with both.

Mathematica produces what you might expect.

    PolarPlot[Cos[2 t/3], {t, 0, 2 Pi}]

Mathematica plot of Cos[2 theta/3]

Matplotlib produces something very different. It handles negative r values by moving the point r = 0 to a circle in the middle of the plot. Notice the r-axis labels at about 22° running from -1 to 1.

    theta = linspace(0, 2*pi, 1000)
    plt.polar(theta, cos(2*theta/3))

Python plot of cos( 2 theta / 3 )

Note also that in Mathematica, the first argument to PolarPlot is r(θ) and the second is the limits on θ. In matplotlib, the first argument is θ and the second argument is r(θ).

Note that in this particular example, taking the absolute value of the function being plotted was enough to make matplotlib act like I expected. That’s only happened true when plotted over the entire range 0 to 8π. In general you have to do more work than this. If we insert absolute value in the plot above, still plotting from 0 to 2π, we do not reproduce the Mathematca plot.

    plt.polar(theta, abs(cos(2*theta/3)))

More polar coordinate posts

[1] Rational Rose was horribly buggy when I used it in the 1990s. Maybe it’s not so buggy now. But I imagine I still wouldn’t like the UML-laden style of software development it was built around.

The post Playing around with a rational rose first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/10/14/playing-around-with-a-rational-rose/feed/ 2
Quatrefoils https://www.johndcook.com/blog/2020/10/13/quatrefoils/ https://www.johndcook.com/blog/2020/10/13/quatrefoils/#comments Wed, 14 Oct 2020 02:06:21 +0000 https://www.johndcook.com/blog/?p=62971 I was reading The 99% Invisible City this evening, and there was a section on quatrefoils. Here’s an example of a quatrefoil from Wikipedia. There’s no single shape known as a quatrefoil. It’s a family of shapes that look something like the figure above. I wondered how you might write a fairly simple mathematical equation […]

The post Quatrefoils first appeared on John D. Cook.

]]>
I was reading The 99% Invisible City this evening, and there was a section on quatrefoils. Here’s an example of a quatrefoil from Wikipedia.

quatrefoil

There’s no single shape known as a quatrefoil. It’s a family of shapes that look something like the figure above.

I wondered how you might write a fairly simple mathematical equation to draw a quatrefoil. Some quatrefoils are just squares with semicircles glued on their edges. That’s no fun.

Here’s a polar equation I came up with that looks like a quatrefoil, if you ignore the interior lines.

This is the plot of r = cos(2θ/3).

Plot of r = abs(cos(2 theta / 2) )

Update: Based on a suggestion in the comments, I’ve written another post on quatrefoils using an equation that has a parameter to control the shape.

The post Quatrefoils first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/10/13/quatrefoils/feed/ 3
Kronecker sum https://www.johndcook.com/blog/2020/10/11/kronecker-sum/ https://www.johndcook.com/blog/2020/10/11/kronecker-sum/#comments Sun, 11 Oct 2020 23:25:11 +0000 https://www.johndcook.com/blog/?p=62850 I’m working on a project these days where I’ve used four different kinds of matrix product, which made me wonder if there’s another kind of product out there that I could find some use for. In the process of looking around for other matrix products, I ran across the Kronecker sum. I’ve seen Kronecker products […]

The post Kronecker sum first appeared on John D. Cook.

]]>
I’m working on a project these days where I’ve used four different kinds of matrix product, which made me wonder if there’s another kind of product out there that I could find some use for.

In the process of looking around for other matrix products, I ran across the Kronecker sum. I’ve seen Kronecker products many times, but I’d never heard of Kronecker sums.

The Kronecker sum is defined in terms of the Kronecker product, so if you’re not familiar with the latter, you can find a definition and examples here. Essentially, you multiply each scalar element of the first matrix by the second matrix as a block matrix.

The Kronecker product of an m × n matrix A and a p × q matrix B is a mp × nq matrix KA B. You could think of K as an m × n matrix whose entries are p × q blocks.

So, what is the Kronecker sum? It is defined for two square matrices, an n × n matrix A and an m × m matrix B. The sizes of the two matrices need not match, but the matrices do need to be square.  The Kronecker sum of A and B is

AB = AIm + InB

where Im and In are identity matrices of size m and n respectively.

Does this make sense dimensionally? The left side of the (ordinary) matrix addition is nm × nm, and so is the right side, so the addition makes sense.

However, the Kronecker sum is not commutative, and usually things called “sums” are commutative. Products are not always commutative, but it goes against convention to call a non-commutative operation a sum. Still, the Kronecker sum is kinda like a sum, so it’s not a bad name.

I don’t have any application in mind (yet) for the Kronecker sum, but presumably it was defined for a good reason, and maybe I’ll run an application, maybe even on the project alluded to at the beginning.

There are several identities involving Kronecker sums, and here’s one I found interesting:

exp( A ) ⊗ exp( B ) = exp( A B ).

If you haven’t seen the exponential of a matrix before, basically you stick your matrix into the power series for the exponential function.

Examples

First, let’s define a couple matrices A and B.

\begin{align*} A &= \left( \begin{array}{cc} 1 & 2 \\ 3 & 4 \\ \end{array} \right) \\ B &= \left( \begin{array}{ccc} 1 & 0 & 1 \\ 1 & 2 & 0 \\ 2 & 0 & 3 \\ \end{array} \right) \end{align*}

We can compute the Kronecker sums

S = AB

and

T = B ⊕ A

with Mathematica to show they are different.

    A = {{1, 2}, {3, 4}}
    B = {{1, 0, 1}, {1, 2, 0}, {2, 0, 3}}
    S = KroneckerProduct[A, IdentityMatrix[3]] + 
        KroneckerProduct[IdentityMatrix[2], B]
    T = KroneckerProduct[B, IdentityMatrix[2]] + 
        KroneckerProduct[IdentityMatrix[3], A]

This shows

\begin{align*} A \oplus B &= \left( \begin{array}{cccccc} 2 & 0 & 1 & 2 & 0 & 0 \\ 1 & 3 & 0 & 0 & 2 & 0 \\ 2 & 0 & 4 & 0 & 0 & 2 \\ 3 & 0 & 0 & 5 & 0 & 1 \\ 0 & 3 & 0 & 1 & 6 & 0 \\ 0 & 0 & 3 & 2 & 0 & 7 \\ \end{array} \right) \\ B \oplus A &= \left( \begin{array}{cccccc} 2 & 2 & 0 & 0 & 1 & 0 \\ 3 & 5 & 0 & 0 & 0 & 1 \\ 1 & 0 & 3 & 2 & 0 & 0 \\ 0 & 1 & 3 & 6 & 0 & 0 \\ 2 & 0 & 0 & 0 & 4 & 2 \\ 0 & 2 & 0 & 0 & 3 & 7 \\ \end{array} \right) \end{align*}

and so the two matrices are not equal.

We can compute the matrix exponentials of A and B with the Mathematica function MatrixExp to see that

\begin{align*} \exp(A) &= \left( \begin{array}{cc} 2.71828 & 7.38906 \\ 20.0855 & 54.5982 \\ \end{array} \right) \\ \exp(B) &= \left( \begin{array}{ccc} 2.71828 & 1. & 2.71828 \\ 2.71828 & 7.38906 & 1. \\ 7.38906 & 1. & 20.0855 \\ \end{array} \right) \end{align*}

(I actually used MatrixExp[N[A]] and similarly for B so Mathematica would compute the exponentials numerically rather than symbolically. The latter takes forever and it’s hard to read the result.)

Now we have

\begin{align*} \exp(A) \otimes \exp(B) &= \left( \begin{array}{cccccc} 512.255 & 0. & 606.948 & 736.673 & 0. & 872.852 \\ 361.881 & 384.002 & 245.067 & 520.421 & 552.233 & 352.431 \\ 1213.9 & 0. & 1726.15 & 1745.7 & 0. & 2482.38 \\ 1105.01 & 0. & 1309.28 & 1617.26 & 0. & 1916.22 \\ 780.631 & 828.349 & 528.646 & 1142.51 & 1212.35 & 773.713 \\ 2618.55 & 0. & 3723.56 & 3832.45 & 0. & 5449.71 \\ \end{array} \right) \\ \exp(A \oplus B) &= \left( \begin{array}{cccccc} 512.255 & 0. & 606.948 & 736.673 & 0. & 872.852 \\ 361.881 & 384.002 & 245.067 & 520.421 & 552.233 & 352.431 \\ 1213.9 & 0. & 1726.15 & 1745.7 & 0. & 2482.38 \\ 1105.01 & 0. & 1309.28 & 1617.26 & 0. & 1916.22 \\ 780.631 & 828.349 & 528.646 & 1142.51 & 1212.35 & 773.713 \\ 2618.55 & 0. & 3723.56 & 3832.45 & 0. & 5449.71 \\ \end{array} \right) \end{align*}

and so the two matrices are equal.

Even though the identity

exp( A ) ⊗ exp( B ) = exp( A B )

may look symmetrical, it’s not. The matrices on the left do not commute in general. And not only are AB and BA different in general, their exponentials are also different. For example

\exp(B\oplus A) = \left( \begin{array}{cccccc} 512.255 & 736.673 & 0. & 0. & 606.948 & 872.852 \\ 1105.01 & 1617.26 & 0. & 0. & 1309.28 & 1916.22 \\ 361.881 & 520.421 & 384.002 & 552.233 & 245.067 & 352.431 \\ 780.631 & 1142.51 & 828.349 & 1212.35 & 528.646 & 773.713 \\ 1213.9 & 1745.7 & 0. & 0. & 1726.15 & 2482.38 \\ 2618.55 & 3832.45 & 0. & 0. & 3723.56 & 5449.71 \\ \end{array} \right)

Related posts

The post Kronecker sum first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/10/11/kronecker-sum/feed/ 3
Nonlinear mod 5 https://www.johndcook.com/blog/2020/10/10/nonlinear-mod-5/ https://www.johndcook.com/blog/2020/10/10/nonlinear-mod-5/#respond Sat, 10 Oct 2020 13:16:32 +0000 https://www.johndcook.com/blog/?p=62754 This post is a follow-on to the previous post on perfectly nonlinear functions. In that post we defined a way to measure the degree of nonlinearity of a function between two Abelian groups. We looked at functions that take sequences of four bits to a single bit. In formal terms, our groups were GF(24) and […]

The post Nonlinear mod 5 first appeared on John D. Cook.

]]>
This post is a follow-on to the previous post on perfectly nonlinear functions. In that post we defined a way to measure the degree of nonlinearity of a function between two Abelian groups. We looked at functions that take sequences of four bits to a single bit. In formal terms, our groups were GF(24) and GF(2).

In this post we’ll start out by looking at integers mod 5 to show how the content of the previous post takes a different form in different groups. Sometimes the simplicity of working in binary makes things harder to see. For example, we noted in the previous post that the distinction between addition and subtraction didn’t matter. It matters in general!

Let B be the group of integers mod 5 and A the group of pairs of integers mod 5. That is, A is the direct product B × B. We will compute the uniformity of several functions from A to B. (Recall that less uniformity means more nonlinearity.) Then we’ll conclude by looking what happens if we work modulo other integers.

Python code

Here’s the code we’ll use to compute uniformity.

    from itertools import product
    from numpy import array

    m = 5
    r = range(m)

    def deriv(f, x, a):
        return (f(x + a) - f(x))%m

    def delta(f, a, b):
        return {x for x in product(r, r) if all(deriv(f, array(x), a) == b)}

    def uniformity(f):
        u = 0
        for a in product(r, r):
            if a == (0, 0):
                continue
            for b in product(r, r):
                t = len(delta(f, array(a), array(b)))
                if t > u:
                    u = t
        return u

We didn’t call attention to it last time, but the function f(x + a) – f(x) is called a derivative of f, and that’s why we named the function above deriv.

Here are a few functions whose uniformity we’d like to compute.

    # (u, v) -> u^2 + v^2 
    def F2(x): return sum(x**2)%m

    # (u, b) -> u^3 + v^3 
    def F3(x): return sum(x**3)%m

    # (u, b) -> u^2 + v
    def G(x):  return (x[0]**2 + x[1])%m

    # (u, v) -> uv
    def H(x):  return (x[0]*x[1])%m

Now let’s look at their uniformity values.

    print( uniformity(F2) )
    print( uniformity(F3) )
    print( uniformity(G) )
    print( uniformity(H) )

Results mod 5

This prints out 5, 10, 25, and 5. This says that the functions F2 and H are the most nonlinear, in fact “perfectly nonlinear.” The function G, although nonlinear, has the same uniformity as a linear function.

The function F3 may be the most interesting, having intermediate uniformity. In some sense the sum of cubes is closer to being a linear function than the sum of squares is.

Other moduli

We can easily look at other groups by simply changing the value of m at the top of the code.

If we set the modulus to 7, we get analogous results. The uniformities of the four functions are 7, 14, 49, and 7. They’re ranked exactly as they were when the modulus was 5.

But that’s not always the case for other moduli as you can see in the table below. Working mod 8, for example, the sum of squares is more uniform than the sum of cubes, the opposite of the case mod 5 or mod 7.

    |----+-----+-----+-----+----|
    |  m |  F2 |  F3 |   G |  H |
    |----+-----+-----+-----+----|
    |  2 |   4 |   4 |   4 |  2 |
    |  3 |   3 |   9 |   9 |  3 |
    |  4 |  16 |   8 |  16 |  8 |
    |  5 |   5 |  10 |  25 |  5 |
    |  6 |  36 |  36 |  36 | 18 |
    |  7 |   7 |  14 |  49 |  7 |
    |  8 |  64 |  32 |  64 | 32 |
    |  9 |  27 |  81 |  81 | 27 |
    | 10 | 100 | 100 | 100 | 50 |
    | 11 |  11 |  22 | 121 | 11 |
    | 12 | 144 | 144 | 144 | 72 |
    |----+-----+-----+-----+----|

In every case, G is the most uniform function and H is the least. Also, G is strictly more uniform than H. But there are many different ways F2 and F3 can fit between G and H.

The post Nonlinear mod 5 first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/10/10/nonlinear-mod-5/feed/ 0
Perfectly nonlinear functions https://www.johndcook.com/blog/2020/10/09/perfectly-nonlinear-functions/ https://www.johndcook.com/blog/2020/10/09/perfectly-nonlinear-functions/#respond Fri, 09 Oct 2020 15:01:28 +0000 https://www.johndcook.com/blog/?p=62678 The other day I heard someone suggest that a good cocktail party definition of cryptography is puzzle making. Cryptographers create puzzles that are easy to solve given the key, but ideally impossible without the key. Linearity is very useful in solving puzzles, and so a puzzle maker would like to create functions that are as […]

The post Perfectly nonlinear functions first appeared on John D. Cook.

]]>
The other day I heard someone suggest that a good cocktail party definition of cryptography is puzzle making. Cryptographers create puzzles that are easy to solve given the key, but ideally impossible without the key.

Linearity is very useful in solving puzzles, and so a puzzle maker would like to create functions that are as far from linear as possible. That is why cryptographers are interested in perfectly nonlinear functions (PNs) and almost perfect nonlinear functions (APNs), which we will explore here.

The functions we are most interested in map a set of bits to a set of bits. To formalize things, we will look at maps between two finite Abelian groups A and B. If we look at lists of bits with addition defined as component-wise XOR [1], we have an Abelian group, but the ideas below apply to other kinds of Abelian groups as well.

We will measure the degree of nonlinearity of a function F from A to B by the size of the sets

\delta(a, b) = \{x \mid F(a + x) - F(x) = b\}

where a and x are in A and b is in B [2].

If F is a linear function, F(a + x) = F(a) + F(x), and so δ(a, b) simplifies to

\delta(a, b) = \{x \mid F(a) = b\}

which equals all of the domain A, if F(a) = b. So for linear functions, the set δ(a, b) is potentially as large as possible, i.e. all of A. Perfectly nonlinear functions will be functions where the sets δ(a, b) are as small as possible, and almost perfectly nonlinear functions will be functions were δ(a, b) is as small as possible in a given context.

The uniformity of a function F is defined by

\underset{a \ne 0}{\max_{a \in A, \, b\in B}} \,\, |\delta(a,b)|

The restriction a ≠ 0 is necessary since otherwise the uniformity of every function would be the same.

For linear functions, the uniformity is as large as possible, i.e. |A|. As we will show below, a nonlinear function can have maximum uniformity as well, but some nonlinear functions have smaller uniformity. So some nonlinear functions are more nonlinear than others.

We will compute the uniformity of a couple functions, mapping 4 bits to 1 bit, with the following Python code.

    A = range(16)
    B = range(2)
    nonZeroA = range(1, 16)

    def delta(f, a, b):
        return {x for x in A if f(x^a) ^ f(x) == b}

    def uniformity(f):
        return max(len(delta(f, a, b)) for a in nonZeroA for b in B)

We could look at binary [3] functions with domains and ranges of different sizes by changing the definitions of A, B, and nonZeroA.

Note that in our groups, addition is component-wise XOR, denoted by ^ in the code above, and that subtraction is the same as addition since XOR is its own inverse. Of course in general addition would be defined differently and subtraction would not be the same as addition. When working over a binary field, things are simpler, sometimes confusingly simpler.

We want to compute the uniformity of two functions. Given bits x0, x1, x2, and x3, we define

F(x0, x1, x2, x3) = x0 x1

and

G(x0, x1, x2, x3) = x0 x1 + x2 x3.

We can implement these in Python as follows.

    def F(x):
        x = [c == '1' for c in format(x, "04b")]
        return x[0] and x[1]

    def G(x):
        x = [c == '1' for c in format(x, "04b")]
        return (x[0] and x[1]) ^ (x[2] and x[3])

Next we compute the uniformity of the two functions.

    print( uniformity(F) )
    print( uniformity(G) )

This shows that the uniformity of F is 16 and the uniformity of G is 8. Remember that functions with smaller uniformity are considered more nonlinear. Both functions F and G are nonlinear, but G is more nonlinear in the sense defined here.

To see that F is nonlinear, let x = (1, 0, 0, 0) and y = (0, 1, 0, 0).

1 = F(x + y) ≠ F(x) + F(y) = 0.

It turns out that 8 is the smallest possible uniformity for a function from 4 bits to 1 bit, and the function G is said to be perfectly nonlinear.

I haven’t defined perfect nonlinearity or almost perfect nonlinearaity precisely, only saying that they have something to do with maximum nonlinearity (minimum uniformity) in some context. I may say more about that context in a future post.

Related posts

[1] XOR stands for exclusive or. For Boolean variables p and q, p XOR q is true if either p is true or q is true, but not both. The component-wise XOR of two binary numbers sets each bit of the result to the XOR of the corresponding input bits. For example, 1001 XOR 1010 = 0011.

[2] This definition comes from Perfect nonlinear functions and cryptography by Céline Blondeau and Kaisa Nyberg.

[3] The next post replaces binary numbers with base 5 and other bases.

The post Perfectly nonlinear functions first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/10/09/perfectly-nonlinear-functions/feed/ 0
Python one-liner to print Roman numerals https://www.johndcook.com/blog/2020/10/07/roman-numerals/ https://www.johndcook.com/blog/2020/10/07/roman-numerals/#comments Wed, 07 Oct 2020 12:00:19 +0000 https://www.johndcook.com/blog/?p=62530 Here’s an amusing little Python program to convert integers to Roman numerals: def roman(n): print(chr(0x215F + n)) It only works if n is between 1 and 12. That’s because Unicode contains characters for the Roman numerals I through XII. Here are the characters it produces: Ⅰ Ⅱ Ⅲ Ⅳ Ⅴ Ⅵ Ⅶ Ⅷ Ⅸ Ⅹ […]

The post Python one-liner to print Roman numerals first appeared on John D. Cook.

]]>
Here’s an amusing little Python program to convert integers to Roman numerals:

    def roman(n): print(chr(0x215F + n))

It only works if n is between 1 and 12. That’s because Unicode contains characters for the Roman numerals I through XII.

Here are the characters it produces:

Ⅰ Ⅱ Ⅲ Ⅳ Ⅴ Ⅵ Ⅶ Ⅷ Ⅸ Ⅹ Ⅺ Ⅻ

You may not be able to see these, depending on what fonts you have installed. I can see them in my browser, but when I ran the code above in a terminal window I only saw a missing glyph placeholder.

If you can be sure your reader can see the characters, say in print rather than on the web, the single-character Roman numerals look nice, and they make it clear that they’re to be interpreted as Roman numerals.

Here’s a screenshot of the symbols.

I II III IV V VI VII VIII IX X XI XII

 

The post Python one-liner to print Roman numerals first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/10/07/roman-numerals/feed/ 3
Bidirectional text https://www.johndcook.com/blog/2020/10/06/bidirectional-text/ https://www.johndcook.com/blog/2020/10/06/bidirectional-text/#comments Tue, 06 Oct 2020 11:40:50 +0000 https://www.johndcook.com/blog/?p=62439 This post will take a look at simple bidirectional text, such as a bit of English inside an Arabic document, or a few words of Hebrew inside a French document. If you want to explore the subject in all its complexity, see Unicode Standard Annex 9. You may not need to do anything special to […]

The post Bidirectional text first appeared on John D. Cook.

]]>
This post will take a look at simple bidirectional text, such as a bit of English inside an Arabic document, or a few words of Hebrew inside a French document. If you want to explore the subject in all its complexity, see Unicode Standard Annex 9.

You may not need to do anything special to display bidirectional text. For example, when I typed the following sentence, I just typed the letters in logical order.

The first letter of the Hebrew alphabet is אלף.

For the last word, I typed א, then ל, then ף. When I entered the ל, the editor put it on the left side of the א, and when I entered ף the editor put it to the left of the ל. The characters are stored in memory in the same sequence that I typed them, though they are displayed in the order appropriate for each language.

You can change the default display ordering of characters by inserting control characters. For example, I typed

The [U+202E]quick brown fox[U+202C] jumped.

and the text displays [1] as

The ‮quick brown fox‬ jumped.

The Unicode character U+202E, known as RLO for “right-to-left override,” tells the browser to display the following letters from right-to-left. Then the character U+202C, known as PDF for “pop directional formatting,” exits that mode, returning to left-to-right [2]. If we copy the first sentence into a text file and open it with a hex editor we can see the control characters, circled in red.

hex editor screen shot

I saved the file in UTF-16 encoding to make the characters easy to see: each quartet of hex characters represented a Unicode character. UTF-8 encoding more common and more compressed.

If for some reason you wanted to force Hebrew to display from left-to-right, you could insert U+202D, known as LRO for “left-to-right override.” The character to exit this mode is PDF, U+202C, as before.

Here’s a bit of Hebrew written left-to-right:

Written left-to-right: ‭אלף.

And here’s what it looks like in an hex editor:

another hex editor screen shot

Related posts

[1] This should look the same as

The xof nworb kciuq jumped.

though in this footnote I typed the letters in the order they appear: xof …

If for some reason the text in the body of the post displays in normal order, not as in this note, then something has gone wrong with your browser’s rendering.

[2] So in addition to Portable Document Format and Probability Density Function, PDF can stand for Pop Directional Formatting. Here “pop” is being used in the computer science sense of popping a stack.

The post Bidirectional text first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/10/06/bidirectional-text/feed/ 1
Understanding statistical error https://www.johndcook.com/blog/2020/10/05/understanding-statistical-error/ https://www.johndcook.com/blog/2020/10/05/understanding-statistical-error/#comments Mon, 05 Oct 2020 14:35:04 +0000 https://www.johndcook.com/blog/?p=62410 A simple linear regression model has the form y = μ + βx + ε. This means that the output variable y is a linear function of the input variable x, plus some error term ε that is randomly distributed. There’s a common misunderstanding over whose error the error term is. A naive view is […]

The post Understanding statistical error first appeared on John D. Cook.

]]>
A simple linear regression model has the form

y = μ + βx + ε.

This means that the output variable y is a linear function of the input variable x, plus some error term ε that is randomly distributed.

There’s a common misunderstanding over whose error the error term is. A naive view is that the world really is linear, that

y = μ + βx

is some underlying Platonic reality, and that the only reason that we don’t measure exactly that linear part is that we as observers have made some sort of error, that the fault is the real world rather than in the model.

No, reality is what it is, and it’s our model that is in error. Some aspect of reality may indeed have a structure that is approximately linear (over some range, under some conditions), but when we truncate reality to only that linear approximation, we introduce some error. This error may be tolerable—and thankfully it often is—but the error is ours, not the world’s.

The post Understanding statistical error first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/10/05/understanding-statistical-error/feed/ 2
Why a little knowledge is a dangerous thing https://www.johndcook.com/blog/2020/10/04/why-a-little-knowledge-is-a-dangerous-thing/ https://www.johndcook.com/blog/2020/10/04/why-a-little-knowledge-is-a-dangerous-thing/#respond Sun, 04 Oct 2020 19:57:17 +0000 https://www.johndcook.com/blog/?p=62351 Alexander Pope famously said A little learning is a dangerous thing; Drink deep, or taste not the Pierian spring: There shallow draughts intoxicate the brain, And drinking largely sobers us again. I’ve been thinking lately about why a little knowledge is often a dangerous thing, and here’s what I’ve come to. Any complex system has […]

The post Why a little knowledge is a dangerous thing first appeared on John D. Cook.

]]>
Alexander Pope famously said

A little learning is a dangerous thing;
Drink deep, or taste not the Pierian spring:
There shallow draughts intoxicate the brain,
And drinking largely sobers us again.

I’ve been thinking lately about why a little knowledge is often a dangerous thing, and here’s what I’ve come to.

Any complex system has many causes acting on it. Some of these are going to be more legible than others. Here I’m using “legible” in a way similar to how James Scott uses the term. As Venkatesh Rao summarizes it,

A system is legible if it is comprehensible to a calculative-rational observer looking to optimize the system from the point of view of narrow utilitarian concerns and eliminate other phenomenology. It is illegible if it serves many functions and purposes in complex ways, such that no single participant can easily comprehend the whole. The terms were coined by James Scott in Seeing Like a State.

People who have a little knowledge of a subject are only aware of some of the major causes that are acting, and probably they are aware of the most legible causes. They have an unbalanced view because they are aware of the forces pushing in one direction but not aware of other forces pushing in other directions.

A naive view may be unaware of a pair of causes in tension, and may thus have a somewhat balanced perspective. And an expert may be aware of both causes. But someone who knows about one cause but not yet about the other is unbalanced.

Examples

When I first started working at MD Anderson Cancer Center, I read a book on cancer called One Renegade Cell. After reading the first few chapters, I wondered why we’re not all dead. It’s easy to see how cancer can develop from one bad cell division and kill you a few weeks later. It’s not as easy to understand why that doesn’t usually happen. The spreading of cancer is more legible than natural defenses against cancer.

I was recently on the phone with a client who had learned enough about data deidentification to become worried. I explained that there were also reasons to not be as worried, but that they’re more complicated, less legible.

What to do

Theories are naturally biased toward causes that are amenable to theory, toward legible causes. Practical experience and empirical data tend to balance out theory by providing some insight into less legible causes.

A little knowledge is dangerous not so much because it is partial but because it is biased; it’s often partial in a particular way, such as theory lacking experience. If you spiral in on knowledge in a more balanced manner, with a combination of theory and experience, you might not be as dangerous along the way.

When theory and reality differ, the fault lies in the theory. More on that in my next post. Theory necessarily leaves out complications, and that’s what makes it useful. The art is knowing which complications can be safely ignored under which circumstances.

Related posts

The post Why a little knowledge is a dangerous thing first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/10/04/why-a-little-knowledge-is-a-dangerous-thing/feed/ 0
Expected value of X and 1/X https://www.johndcook.com/blog/2020/10/03/reciprocal-inequality/ https://www.johndcook.com/blog/2020/10/03/reciprocal-inequality/#respond Sat, 03 Oct 2020 15:57:10 +0000 https://www.johndcook.com/blog/?p=62204 Yesterday I blogged about an exercise in the book The Cauchy-Schwarz Master Class. This post is about another exercise from that book, exercise 5.8, which is to prove Kantorovich’s inequality. Assume and for non-negative numbers pi. Then where is the arithmetic mean of m and M and is the geometric mean of m and M. […]

The post Expected value of X and 1/X first appeared on John D. Cook.

]]>
Yesterday I blogged about an exercise in the book The Cauchy-Schwarz Master Class. This post is about another exercise from that book, exercise 5.8, which is to prove Kantorovich’s inequality.

Assume

0 < m \leq x_1 \leq x_2 \leq \cdots \leq x_n \leq M < \infty

and

p_1 + p_2 + \cdots + p_n = 1

for non-negative numbers pi.

Then

\left(\sum_{i=1}^n p_i x_i \right) \left(\sum_{i=1}^n p_i \frac{1}{x_i} \right) \leq \frac{\mu^2}{\gamma^2}

where

\mu = \frac{m+M}{2}

is the arithmetic mean of m and M and

\gamma = \sqrt{mM}

is the geometric mean of m and M.

In words, the weighted average of the x‘s times the weighted average of their reciprocals is bounded by the square of the ratio of the arithmetic and geometric means of the x‘s.

Probability interpretation

I did a quick search on Kantorovich’s inequality, and apparently it first came up in linear programming, Kantorovich’s area of focus. But when I see it, I immediately think expectations of random variables. Maybe Kantorovich was also thinking about random variables, in the context of linear programming.

The left side of Kantorovich’s inequality is the expected value of a discrete random variable X and the expected value of 1/X.

To put it another way, it’s a relationship between E[1/X] and 1/E[X],

\text{E}\left(\frac{1}{X} \right ) \leq \frac{\mu^2}{\gamma^2} \frac{1}{\text{E}(X)}

which I imagine is how it is used in practice.

I don’t recall seeing this inequality used, but it could have gone by in a blur and I didn’t pay attention. But now that I’ve thought about it, I’m more likely to notice if I see it again.

Python example

Here’s a little Python code to play with Kantorovich’s inequality, assuming the random values are uniformly distributed on [0, 1].

    from numpy import random

    x = random.random(6)
    m = min(x)
    M = max(x)
    am = 0.5*(m+M)
    gm = (m*M)**0.5
    prod = x.mean() * (1/x).mean()
    bound = (am/gm)**2
    print(prod, bound)

This returned 1.2021 for the product and 1.3717 for the bound.

If we put the code above inside a loop we can plot the product and its bound to get an idea how tight the bound is typically. (The bound is perfectly tight if all the x’s are equal.) Here’s what we get.

All the dots are above the dotted line, so we haven’t found an exception to our inequality.

(I didn’t think that Kantorovich had made a mistake. If he had, someone would have noticed by now. But it’s worth testing a theorem you know to be true, in order to test that your understanding of the theorem is correct.)

More inequalities

The post Expected value of X and 1/X first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/10/03/reciprocal-inequality/feed/ 0
The baseball inequality https://www.johndcook.com/blog/2020/10/02/the-baseball-inequality/ https://www.johndcook.com/blog/2020/10/02/the-baseball-inequality/#comments Fri, 02 Oct 2020 12:53:16 +0000 https://www.johndcook.com/blog/?p=62178 There’s a theorem that’s often used and assumed to be true but rarely stated explicitly. I’m going to call it “the baseball inequality” for reasons I’ll get to shortly. Suppose you have two lists of k positive numbers each: and Then This says, for example, that the batting average of a baseball team is somewhere […]

The post The baseball inequality first appeared on John D. Cook.

]]>
baseball game

There’s a theorem that’s often used and assumed to be true but rarely stated explicitly. I’m going to call it “the baseball inequality” for reasons I’ll get to shortly.

Suppose you have two lists of k positive numbers each:

n_1, n_2, n_3, \ldots, n_k

and

d_1, d_2, d_3, \ldots, d_k

Then

\min_{1 \leq i \leq k} \frac{n_i}{d_i} \leq \frac{n_1 + n_2 + n_3 + \cdots + n_k}{d_1 + d_2 + d_3 + \cdots + d_k} \leq \max_{1 \leq i \leq k} \frac{n_i}{d_i}

This says, for example, that the batting average of a baseball team is somewhere between the best individual batting average and the worst individual batting average.

The only place I can recall seeing this inequality stated is in The Cauchy-Schwarz Master Class by Michael Steele. He states the inequality in exercise 5.1 and gives it the batting average interpretation. (Update: This is known as the “mediant inequality.” Thanks to Tom in the comments for letting me know. So the thing in the middle is called the “mediant” of the fractions.)

Note that this is not the same as saying the average of a list of numbers is between the smallest and largest numbers in the list, though that’s true. The batting average of a team as a whole is not the same as the average of the individual batting averages on that team. It might happen to be, but in general it is not.

I’ll give a quick proof of the baseball inequality. I’ll only prove the first of the two inequalities. That is, I’ll prove that the minimum fraction is no greater than the ratio of the sums of numerators and denominators. Proving that the latter is no greater than the maximum fraction is completely analogous.

Also, I’ll only prove the theorem for two numerators and two denominators. Once you have proved the inequality for two numerators and denominators, you can bootstrap that to prove the inequality for three numerators and three denominators, and continue this process for any number of numbers on top and bottom.

So we start by assuming

\frac{a}{b} \leq \frac{c}{d}

Then we have

\begin{align*} \frac{a}{b} &= \frac{a\left(1 + \dfrac{d}{b} \right )}{b\left(1 + \dfrac{d}{b} \right )} \\ &= \frac{a + \dfrac{a}{b}d}{b + d} \\ &\leq \frac{a + \dfrac{c}{d}d}{b+d} \\ &= \frac{a + c}{b+d} \end{align*}

More inequality posts

The post The baseball inequality first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/10/02/the-baseball-inequality/feed/ 7
Solving for the catenary scale parameter https://www.johndcook.com/blog/2020/09/28/catenary-scale/ https://www.johndcook.com/blog/2020/09/28/catenary-scale/#comments Mon, 28 Sep 2020 16:39:55 +0000 https://www.johndcook.com/blog/?p=61647 A catenary with scale a is the graph of the function f(x; a) = a cosh(x/a) – a. The x and the a are separated by a semicolon rather than a comma to imply that we think of x as the variable and a as a parameter. This graph passes through the origin, i.e. for any […]

The post Solving for the catenary scale parameter first appeared on John D. Cook.

]]>
A catenary with scale a is the graph of the function

f(x; a) = a cosh(x/a) – a.

The x and the a are separated by a semicolon rather than a comma to imply that we think of x as the variable and a as a parameter.

This graph passes through the origin, i.e. for any a, f(0, a) = 0. To find the scaling parameter a we need to specify the value of f at one more point and solve f(x, a) = y. This is the equation I alluded to recently as not having a closed-form solution.

Without loss of generality, we can assume x = 1. Why is that?

Define

g(a) = f(1; a).

Then

f(x‘; a‘) = y

if and only if

g(a‘/x‘) = y‘/x‘.

So will assume x = 1 for now and focus on solving for a value of a such that g(a) = y. But we will include Python code shortly that goes back to f, i.e. does not assume x = 1.

The Taylor series for g looks like

g(a) = 1 / 2a + 1 / 24a³ + …

and so for large a,

g(a) ≈ 1 / 2a.

Here are a couple plots showing how good this approximation is, even for a not that large. First, a plot of g and its approximation.

And here’s a plot of the relative error in approximating g(a) by 1/2a.

This means that for small y,

g(1 / 2y) ≈ y.

It’s fortunate that we have a convenient approximation when y is small, because in practice y is usually small: catenaries are usually wider than deep, or at least not much deeper than wide.

Since the terms in the Taylor series that we discarded are all positive, we also have a bound

g(1 / 2y) > y.

If we want to solve for a numerically, 1 / 2y makes a good starting guess, and it also makes a left bracket for root-finding methods that require a bracket around the root.

Here’s Python code to solve f(x, a) = y for a, given x and y.

    from numpy import cosh
    from scipy.optimize import root

    def g(a):
        return a*cosh(1/a) - a

    def solve_g(y):
        assert(y > 0)
        lower = 0.5/y  
        return root(lambda a: g(a) - y, lower).x

    def solve_catenary(x, y):
        "Solve for a such that a cosh(x/a) - a == y."
        return x*solve_g(y/abs(x))

Now that we can solve g(a) = y numerically, we can go back and see how far 1 / 2y is from the actual solution for varying values of y.

Recall that the second plot above showed that the relative error in approximating g(a) by 1 / 2a is large when a is small (and thus y is big). But the plot immediately above shows that nevertheless, 1/ 2y is a good guess at a solution to g(a) = y, never off by more than 0.175.

Recall also that we said 1 / 2y is a lower bound on the solution, and could be used as a left bracket in a root-finding method that requires a bracket. The plot above suggests that 0.18 + 1 / 2y would work as a right bracket

More catenary posts

The post Solving for the catenary scale parameter first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/09/28/catenary-scale/feed/ 1
Alphabets and Unicode https://www.johndcook.com/blog/2020/09/27/alphabets-and-unicode/ https://www.johndcook.com/blog/2020/09/27/alphabets-and-unicode/#respond Sun, 27 Sep 2020 19:36:00 +0000 https://www.johndcook.com/blog/?p=61670 ASCII codes may seem arbitrary when you’re looking at decimal values, but they make more sense in hex [1]. For example, the ASCII value for 0 is 48. Why isn’t it zero, or at least a number that ends in zero? Well it is, in hex: 0x30. And the codes are in consecutive order, so […]

The post Alphabets and Unicode first appeared on John D. Cook.

]]>
ASCII codes may seem arbitrary when you’re looking at decimal values, but they make more sense in hex [1]. For example, the ASCII value for 0 is 48. Why isn’t it zero, or at least a number that ends in zero? Well it is, in hex: 0x30. And the codes are in consecutive order, so the ASCII value of a digit d is d + 0x30.

There are also patterns in ASCII codes for letters, and this post focuses on these patterns and their analogies in the Unicode values assigned to other alphabets.

Latin

Letters have a similar pattern to digits in ASCII. A is 0x41 and a is 0x61. The upper case and lower case codes are 32 (0x20) apart. Consecutive letters have consecutive ASCII codes, so the nth letter of the alphabet is 0x40 + n, in capital form, and 0x60 + n in lower case form,

Unicode absorbed the first 128 ASCII values for backward compatibility. And some of the patterns in the Latin alphabet carry over to other alphabets. Older codings for other languages were imported into Unicode similar to the way ASCII was, but with an offset. For example, the Unicode values for Cyrrilic letters are essentially those from ISO 8859-5 with a offset of 0x360.

Greek

For example, Greek upper case and lower case letters are also 0x20 apart. Capital alpha is U+0391, and lower case alpha is U+03B1 [2]. As with Latin, capital letters come first. Unicode values are consecutive, so the nth letter of the Greek alphabet is 0x391 + n, in capital form, and 0x3B0 + n in lower case form.

There’s a wrinkle, however. The rule above only holds for n from 1 to 17, because there are two version of the 18th letter, sigma. Greek has two versions of lower case sigma—ς (U+03C2) at the end of words and σ (U+03C3) everywhere else—but only one upper case sigma Σ. The Unicode value U+03A2 is unassigned, so that the pattern of capitals and lower case letters being separated by 0x20 will continue after sigma.

Letters as numerals

The Greeks associated numerical values with letters: Α (alpha) = 1, Β (beta) = 2, Γ (gamma) = 3, etc. That means the numerical value associated with a letter is its Unicode value minus 0x390. That works for the the numbers 1 through 10.

But then starting with the 10th letter, Κ (kappa), the letters start counting by 10s: Λ = 20, etc. So for the letters Κ (kappa) through Ρ (rho), the numerical value is 10(U – 0x399) where U is the Unicode value. The letters count by 100s starting with Ρ (rho), and then the gap at Σ complicates things.

Russian

Russian uses the Cyrillic alphabet, so I should say the Cyrillic alphabet, just as I started with the “Latin” alphabet, not the English alphabet. But several languages used the Cyrillic alphabet, and some may use it differently than Russian, so I’ll say “Russian” to avoid possibly saying something that’s not true.

As with Latin and Greek, the Unicode values for Russian letters are consecutive, and code points for capital letters and lower case letters differ by 32 (0x20). But the Russian alphabet has 33 letters, so something’s got to give.

The quirk is the 7th letter, Ё (yo). The capital letters in the Russian alphabet start with U+0410 and are consecutive up to U+042F. But there’s an interruption in the sequence with Ё. As the 7th letter, you would expect it to have Unicode value U+0416, but that’s the code point for the 8th letter, Ж. Yo has Unicode value U+0401. And while you can find the lower case value of the rest of the letters in the Russian alphabet by adding 32 (0x20), the lower case yo has value U+0451.

Hebrew

Hebrew doesn’t have upper and lower case letters, so that pattern can’t carry over. Unicode does assign consecutive values to consecutive letters, but only if you count final forms as separate letters, and list them before their ordinary forms. The first letter of the Hebrew alphabet has Unicode value U+05D0, so the nth letter has Unicode value 0x5CF + n. That holds for n up to 10.

The first 10 letters of the Hebrew alphabet have only one form. But the 11th letter, kaf, has a final form and a non-final form. Final forms are listed first, so 0x5CF + 11  = 0x5DA goes to final kaf, ך, and 0x5DB goes to (non-final) kaf, כ.

Hebrew has a way of associating numerical values to letters, very similar to the one described above for Greek. For the first 10 letters, the associated numerical value is the Unicode value minus 0x5CF, but then final forms complicate things.

Related posts

[1] Hex is short for hexadecimal, i.e. base 16. The 0x in front of a number indicates that it’s a hexadecimal number.

[2] It’s standard to refer to Unicode values in the format U+xxxx where xxxx is a hexadecimal number. So U+03B1 has numerical value 0x3B1, or 945 in decimal.

The post Alphabets and Unicode first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/09/27/alphabets-and-unicode/feed/ 0
How much does it matter if the measuring tape sags? https://www.johndcook.com/blog/2020/09/26/sagging-tape-measure/ https://www.johndcook.com/blog/2020/09/26/sagging-tape-measure/#comments Sat, 26 Sep 2020 18:35:13 +0000 https://www.johndcook.com/blog/?p=61635 There are a couple ways in which a measurement might not be straight. Yesterday I wrote a blog post about not measuring straight toward your target. You’d like to measure from (0, 0) to (x, 0), but something is in the way, and so you measure from (0, 0) to (x, y), where y is […]

The post How much does it matter if the measuring tape sags? first appeared on John D. Cook.

]]>
couple measuring a wall

There are a couple ways in which a measurement might not be straight. Yesterday I wrote a blog post about not measuring straight toward your target. You’d like to measure from (0, 0) to (x, 0), but something is in the way, and so you measure from (0, 0) to (x, y), where y is small relative to x. We assumed the tape measure was straight, but didn’t aim straight at the target.

In this post we will consider measuring from (0, y) to (x, y), but with the tape measure sagging to (x/2, 0) in the middle. We end exactly at our target, but the tape bends. As before, we’ll assume y is small relative to x. About how large will our error be as a function of y?

My first thought was to assume our tape measure takes the shape of a catenary, because that’s the shape of a handing cable. But the resulting calculations are too complicated. The calculations depend on solving for a scaling factor, and that calculation cannot be done in closed form.

Since we are assuming the amount of sag y is small relative to the horizontal distance x, a parabola will do just as well as a catenary. (More on how well a parabola approximates a catenary here.)

The equation of the parabola passing through our three specified points, as a function of t, is

4y(tx/2)² / x².

Even with our simplifying assumption that the tape bends like a parabola, the arclength calculation is still a little tedious, but if I’ve done things correctly the result turns out to be

x + 32y² / 3x + higher order terms

So the error is on the order of (32/3) y²/x, where as in the earlier post the error was on the order of (1/2) y²/x. If y is small relative to x, the error is still small, but about 20x larger than before.

Going back to our example of measuring a 10 ft (120 inch) room, we want to measure from (0, 0) to (120, 0). If we measure with a straight line from (0, 0) to (120, 4) instead, there will be an error of about 1/15 of an inch. If instead we measure from (0, 4) to (120, 0) with a tape that sags like a parabola touching (60, 0), then the error will be closer to an inch and a half. But if we could pull tight enough to limit the sag to 1 inch, the measurement error would be more like 1/10 of an inch.

Related posts

I inadvertently ended up writing three blog posts in a row related to measuring tapes. Here are the other two:

The post How much does it matter if the measuring tape sags? first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/09/26/sagging-tape-measure/feed/ 4
From tape measures to tensors https://www.johndcook.com/blog/2020/09/26/tape-measures-and-tensors/ https://www.johndcook.com/blog/2020/09/26/tape-measures-and-tensors/#comments Sat, 26 Sep 2020 14:50:42 +0000 https://www.johndcook.com/blog/?p=61615 This post will start with a motivating example, looking at measuring a room in inches and in feet. Then we will segue into a discussion of contravariance and covariance in the simplest setting. Then we will discuss contravariant and covariant tensors more generally. Using a tape measure In my previous post, I explained why it […]

The post From tape measures to tensors first appeared on John D. Cook.

]]>
tape meausre

This post will start with a motivating example, looking at measuring a room in inches and in feet. Then we will segue into a discussion of contravariance and covariance in the simplest setting. Then we will discuss contravariant and covariant tensors more generally.

Using a tape measure

In my previous post, I explained why it doesn’t matter if a tape measure is perfectly straight when measuring a long distance. In a nutshell, if you want to measure x, but instead you measure the hypotenuse of a triangle with sides x and y, where y is much smaller than x, the difference is approximately y²/2x. The error is nowhere as big as y.

In that post I gave the example of measuring a wall that is 10 feet long, and measuring to a point 4 inches up the adjacent wall rather than measuring to the corner. The error is about 1/15 of an inch.

Now suppose we’re off by more, measuring 12 inches up the other wall. Now that we have an even foot, we can switch over to feet and work with smaller, simpler numbers. Now we have x = 10 and y = 1. So the error is approximately 1/20.

Before, we were working in inches. We had x = 120, y = 4, and error 1/15. Does that mean our error is now smaller? That can’t be. If the short leg of our triangle is longer, 12 inches rather than 4 inches, our error should go up, not down.

Of course the resolution is that our error was 1/15 of an inch in the first example, and 1/20 of a foot in the second example. If we were to redo our second example in inches, we’d get error 12²/240 = 12/20, i.e. we’d convert 1/20 of a foot to 12/20 of an inch.

Change of units and contravariance

Now here’s where tensors come in. Notice that when we use a larger unit of measurement, a foot instead of an inch, we get a smaller numerical value for error. Trivial, right? If you first measure a distance in meters, you’ll get larger numbers if you switch to centimeters, but smaller numbers if you switch to light years.

But this simple observation is an example of a deeper pattern. Measurements of this kind are contravariant, meaning that our numerical values change in the opposite direction as our units of measurement.

A velocity vector is contravariant because if you use smaller units of length, you get larger numerical values of velocity, and vice versa. Under a change of units, velocity changes in the opposite direction of the units.

A gradient vector is covariant because if you use smaller units of length, a function will vary less per unit length. Gradients change in the same direction as your units.

The discussion so far has been informal and limited to a very special change of coordinates. It’s not just the direction of change that matters, that results change monotonically with units, but that they increase or decrease by the exact same proportion. And the kinds of coordinate changes we usually have in mind are not changing from inches to feet but rather changing from rectangular coordinates to polar coordinates.

More general and more formal

Suppose you have some function T described by coordinates denoted by x‘s with superscripts. Put bars on top of everything to denote a new representation of T with respect to new coordinates. If T is a contravariant vector we have,

\bar{T}^i =T^r \frac{\partial \bar{x}^i}{\partial x^r}

and if T is a covariant vector we have

\bar{T}_i =T_r \frac{\partial x^r}{\partial \bar{x}^i}

In the equations above there is an implicit summation over the repeated index r, using the so-called Einstein summation convention.

The examples at the end of the previous section are the canonical examples: tangent vectors are contravariant and gradients are covariant.

If the xs without bars are measured in inches and the xs with bars are measured in feet, the partial derivative of an x bar with respect to the corresponding x is 1/12, because a unit change in inches causes a change of 1/12 in feet.

Vectors are a special case of tensors, called 1-tensors. Higher order tensors satisfy analogous rules. A 2-tensor is contravariant if

 \bar{T}^{ij} = T^{rs} \frac{\partial\bar{x}^i}{\partial x^r} \frac{\partial\bar{x}^j}{\partial x^s}

and covariant if

\bar{T}_{ij} = T_{rs} \frac{\partial x^r}{\partial\bar{x}^i} \frac{\partial x^s}{\partial \bar{x}^j}

Even more generally you can have tensors of any order, and they can be contravariant in some components and covariant in others.

Backing up

For more on tensors, you may want to read a five-part series of blog posts I wrote starting with What is a tensor?. The word “tensor” is used in several related but different ways. The view of tensors given here, as things that transform a certain way under changes of coordinates, is discussed in the fourth post in that series.

The post From tape measures to tensors first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/09/26/tape-measures-and-tensors/feed/ 3
It doesn’t matter much if the tape is straight https://www.johndcook.com/blog/2020/09/25/measuring-tape/ https://www.johndcook.com/blog/2020/09/25/measuring-tape/#comments Sat, 26 Sep 2020 01:35:49 +0000 https://www.johndcook.com/blog/?p=61581 Suppose a contractor is measuring the length of a wall. He starts in one corner of the room, and lets out a tape measure heading for the other end of the wall. But something is in the way, so instead of measuring straight to the corner, he measures to a point near the corner on […]

The post It doesn't matter much if the tape is straight first appeared on John D. Cook.

]]>
Architect measureing floor

Suppose a contractor is measuring the length of a wall. He starts in one corner of the room, and lets out a tape measure heading for the other end of the wall. But something is in the way, so instead of measuring straight to the corner, he measures to a point near the corner on the adjacent wall.

If you looked down on the room from a bird’s eye view, the contractor wants to measure the distance from (0, 0) to (x, 0), but instead measures from (0, 0) to (x, y) where y is small relative to x. How much difference does this make?

The measurement error, as a function of y, is given by

(x² + y²)1/2x.

Expanding this function in a Taylor series around y = 0 shows that the error is approximately

y²/2x.

So the error is not on the order of y but of y²/x. The latter is much smaller if y is small relative to x.

For example, suppose a room is 10 feet (120 inches) long. If someone were to measure the length of the room by running a tape measure to a point 4 inches up the joining wall, the measurement error would not be anywhere near 4 inches but rather nearly 16/240 = 1/15 of an inch.

Let’s work this example out and see how good the approximation was. The hypotenuse of a right triangle with sides 120 and 4 is

√14416 = 120.066648…

which is very close to

120 + 1/15 = 120.06666…

The fact that the measurement wasn’t exactly corner-to-corner would likely not be the largest source of measurement error.

Update: What if the measuring tape sags in the middle?

The post It doesn't matter much if the tape is straight first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/09/25/measuring-tape/feed/ 1
At the next prime, turn left https://www.johndcook.com/blog/2020/09/24/gaussian_integer_walk/ https://www.johndcook.com/blog/2020/09/24/gaussian_integer_walk/#comments Fri, 25 Sep 2020 02:25:15 +0000 https://www.johndcook.com/blog/?p=61552 The previous post mentioned a Math Overflow question about unexpected mathematical images, and reproduced one that looks like field grass. This post reproduces another set of images from that post. Start anywhere in the complex plane with integer coordinates and walk west one unit at a time until you run into Gaussian prime [1]. Then […]

The post At the next prime, turn left first appeared on John D. Cook.

]]>
The previous post mentioned a Math Overflow question about unexpected mathematical images, and reproduced one that looks like field grass. This post reproduces another set of images from that post.

Start anywhere in the complex plane with integer coordinates and walk west one unit at a time until you run into Gaussian prime [1]. Then turn left (counterclockwise) 90° and keep taking unit steps. Apparently this process will often (always?) return you to your starting point.

Different starting points lead to different patterns. Here’s an example given in the post, starting at 3 + 5i.

starting at 3 + 5i

Here’s a more complex walk starting at 27 + 30i.

starting at 3 + 5i

I tried starting at 127 + 131i and got a simple, uninteresting image. I tried again starting at 127 + 130i and got something much more complicated. I didn’t time it, but it took several minutes to plot.

starting at 3 + 5i

Here’s the code that made the plots. (Note that Python uses j rather than i for imaginary unit.)

from sympy import isprime
import matplotlib.pyplot as plt

def isgaussprime(z: complex):
    a, b = int(z.real), int(z.imag)
    if a*b != 0:
        return isprime(a**2 + b**2)
    else:
        c = abs(a+b)
        return isprime(c) and c % 4 == 3

def connect(z1: complex, z2: complex):
    plt.plot([z1.real, z2.real], [z1.imag, z2.imag], 'b')
    
start = 127 + 130j
#start = 3 + 5j
step = 1
z = start
next = None

while next != start:
    next = z + step
    connect(z, next)
    if isgaussprime(next):
        step *= 1j
    z = next

plt.axes().set_aspect(1)    
plt.show()

Related posts

[1] If a and b are integers, then a + bi is called a Gaussian integer. A Gaussian integer is a Gaussian prime if (1) both a and b are non-zero and a² + b² is prime, or (2) one of a or b is zero, and the absolute value of the non-zero part is a prime congruent to 3 mod 4.

Why is this definition so complicated? It’s actually a theorem. There’s a natural generalization of what it means to be prime in a commutative ring, and it works out that an element the Gaussian integers is prime if and only if the above criteria hold.

In general, a non-zero element p of a commutative ring R is prime if whenever p divides a product ab, p must either divide a or divide b.

The post At the next prime, turn left first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/09/24/gaussian_integer_walk/feed/ 3
Simple equations whose plot looks like field grass https://www.johndcook.com/blog/2020/09/24/fieldgrass/ https://www.johndcook.com/blog/2020/09/24/fieldgrass/#respond Thu, 24 Sep 2020 23:35:40 +0000 https://www.johndcook.com/blog/?p=61542 Math Overflow has an interesting question about unexpected mathematical images. Here’s a response from Payam Seraji that was easy to code up. Here’s the code that produced the image. from numpy import * import matplotlib.pyplot as plt t = linspace(0, 39*pi/2, 1000) x = t*cos(t)**3 y = 9*t*sqrt(abs(cos(t))) + t*sin(0.2*t)*cos(4*t) plt.plot(x, y, c="green") plt.axes().set_aspect(0.3) plt.axis('off')

The post Simple equations whose plot looks like field grass first appeared on John D. Cook.

]]>
Math Overflow has an interesting question about unexpected mathematical images. Here’s a response from Payam Seraji that was easy to code up.

Here’s the code that produced the image.

from numpy import *
import matplotlib.pyplot as plt

t = linspace(0, 39*pi/2, 1000)
x = t*cos(t)**3
y = 9*t*sqrt(abs(cos(t))) + t*sin(0.2*t)*cos(4*t)
plt.plot(x, y, c="green")
plt.axes().set_aspect(0.3)
plt.axis('off')

The post Simple equations whose plot looks like field grass first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/09/24/fieldgrass/feed/ 0
Constructing bilinear transformations https://www.johndcook.com/blog/2020/09/23/bilinear-transformations/ https://www.johndcook.com/blog/2020/09/23/bilinear-transformations/#respond Thu, 24 Sep 2020 01:14:22 +0000 https://www.johndcook.com/blog/?p=61495 The previous post was a visual introduction to bilinear transformations, a.k.a. Möbius transformations or fractional linear transformations. This post is a short follow-up focused more on calculation. A bilinear transformation f has the form where ad – bc ≠ 0. Inverse The inverse of f is given by The transformation f is defined everywhere except […]

The post Constructing bilinear transformations first appeared on John D. Cook.

]]>
The previous post was a visual introduction to bilinear transformations, a.k.a. Möbius transformations or fractional linear transformations. This post is a short follow-up focused more on calculation.

A bilinear transformation f has the form

f(z) = \frac{az + b}{cz + d}

where adbc ≠ 0.

Inverse

The inverse of f is given by

 g(w) = \frac{dw - b}{-cw + a}

The transformation f is defined everywhere except at z = –d/c, and its inverse is defined everywhere except at w = a/c.

So f takes the complex plane minus one point to the complex plane minus one point. Or an elegant way of thinking about it is to think of f and g as functions on a sphere by adding a point at infinity. Then we say

\begin{align*} f(-d/c) &= \infty \\ g(\infty) &= -d/c \\ f(\infty) = a/c \\ g(a/c) &= \infty \end{align*}

Determining by three points

Bilinear transformations have three degrees of freedom. That is, you can pick three values in the domain and specify three places for them to go in the range. The unique bilinear transform sending z1, z2, and z3 to w1, w2, and w3 is given by

\frac{(w - w_2)(w_3 - w_1)}{(w - w_1)(w_3 - w_2)} = \frac{(z - z_2)(z_3 - z_1)}{(z - z_1)(z_3 - z_2)}

Plug in your constants and solve for w = f(z).

Example

For example, let’s look at the smiley face example from the previous post.

We’ll pick three points on the face and three places for them to go.

Let’s say we want the center of the face to stay put, mapping 0 to 0. Next let’s pick two places for the center of the eyes to go. These are at
±0.4+.2ji. Say we want the left eye to go down a little to -0.4 and the right eye to go up and over a little to 0.5 + 0.3i.

I used Mathematica to solve for the parameters.

    {z1, z2, z3} = {0, -2/5 + I/5, 2/5 + I/5}
    {w1, w2, w3} = {0, -2/5, 1/2 + 3 I/10}
    Solve[(w - w2) (w3 - w1)/((w - w1) (w3 - w2)) == 
          (z - z2) (z3 - z1)/((z - z1) (z3 - z2)), w]

This says the parameters are, in Python notation,

    a = -72 - 16j
    b = 0
    c = 30 - 35j
    d = -75

Using the code from the previous post we can verify that this transformation does what we designed it to do.

    print(mobius(0, a, b, c, d))
    print(mobius(-0.4 + 0.2j, a, b, c, d))
    print(mobius(0.4 + 0.2j, a, b, c, d))

and we can take a look at the result.

Related posts

The post Constructing bilinear transformations first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/09/23/bilinear-transformations/feed/ 0
Circles to Circles https://www.johndcook.com/blog/2020/09/23/circles-to-circles/ https://www.johndcook.com/blog/2020/09/23/circles-to-circles/#comments Wed, 23 Sep 2020 13:21:26 +0000 https://www.johndcook.com/blog/?p=61455 This post expands on something I said in passing yesterday. I said in the body of the post that … the image of a circle in the complex plane under a Möbius transformation is another circle. and added in a footnote that For this to always be true, you have to include a line as […]

The post Circles to Circles first appeared on John D. Cook.

]]>
This post expands on something I said in passing yesterday. I said in the body of the post that

… the image of a circle in the complex plane under a Möbius transformation is another circle.

and added in a footnote that

For this to always be true, you have to include a line as a special case of a circle, a circle of infinite radius if you like.

This post will illustrate these statements with Python code and plots. First, some code for drawing circles and other curves in the complex plane.

    from numpy import exp, pi, linspace
    import matplotlib.pyplot as plt

    θ = linspace(0, 2*pi, 200)

    def circle(radius, center):
        return center + radius*exp(1j*θ)

    def plot_curves(curves):
        for c in curves:
            plt.plot(c.real, c.imag)
        plt.axes().set_aspect(1)
        plt.show()
        plt.close()

Next, code for Möbius transformations, and the particular Möbius transformation we’ll use in our plots.

    def mobius(z, a, b, c, d):
        return (a*z + b)/(c*z + d)

    def m(curve):
        return mobius(curve, 1, 2, 3, 4)

Now we’ll plot three circles and their images under the Möbius transformation

m(z) = (z + 2)/(3z + 4)

with the following code.

    circles = [circle(1, 0), circle(2, 0), circle(2, 2)]
    plot_curves(circles)
    plot_curves([m(c) for c in circles])

This produces

and

Notice that the first circle, in blue, started out as the smallest circle and was contained inside the second circle, in orange. But in the image, the blue circle became the largest, and is no longer inside the orange circle. That is because our Möbius transformation has a singularity at -4/3, and things get turned inside-out around that point.

Next we’ll look at an example of lines being mapped to lines.

    line = linspace(-100, 100, 600)
    curves = [line, 1j*line - 4/3]
    plot_curves(curves)
    plot_curves([m(c) for c in curves])

This produces

and

These lines are mapped to lines because they both pass through the singularity at -4/3. The real axis, in blue, is mapped to itself. The line -4/3 + iy, is shifted over to have real part 1/3.

Finally, lets look at lines being mapped to circles. Since the inverse of a Möbius transformation is another Möbius transformation, this example also shows that circles can be mapped to lines.

    
    lines = [1j*line - 4, 1j*line + 4, line - 4j, line + 4j]
    plot_curves(lines)
    plot_curves([m(c) for c in lines])

This produces

and

Note that the circles don’t quite close. That’s because my line only runs from -100 to 100, not -∞ to ∞. The gap in the circles is at 1/3, because that’s the limit of our transformation (z + 2)/(3z + 4) as z goes to ±∞.

Smiley faces

To illustrate things further, I’d like to look at a smiley face and what happens to it under different Möbius transformations.

Here’s the code to draw the original face.

    dash = linspace(0.60, 0.90, 20)
    smile = 0.3*exp(1j*2*pi*dash) - 0.2j
    left_eye  = circle(0.1, -0.4+.2j)
    right_eye = circle(0.1,  0.4+.2j)
    face = [circle(1, 0), left_eye, smile, right_eye]

Next, let’s subject this face to the Möbius transformation with parameters (1, 0, 1, 3). The singularity is at -3, outside the face and fairly far away.

Next we’ll use parameters (1, 0, 1, -1+1j), which has a sigularity at 1 – i, closer to the face, and hence more distortion.

Now we use parameters (1, 0, 3, 1), putting the singularity at -1/3, inside the face.

Finally, we look at parameters (1, 0, 1, 0.4-0.2j), putting the singularity inside the left eye.

The next post explains how to pick the parameters of a Möbius transformation to make points go where you want.

The post Circles to Circles first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/09/23/circles-to-circles/feed/ 2
Simultaneous projects https://www.johndcook.com/blog/2020/09/22/simultaneous-projects/ https://www.johndcook.com/blog/2020/09/22/simultaneous-projects/#comments Wed, 23 Sep 2020 01:06:32 +0000 https://www.johndcook.com/blog/?p=61424 I said something to my wife this evening to the effect that it’s best for employees to have one or at most two projects at a time. Two is good because you can switch off when you’re tired of one project or if you’re waiting on input. But with three or more projects you spend […]

The post Simultaneous projects first appeared on John D. Cook.

]]>
I said something to my wife this evening to the effect that it’s best for employees to have one or at most two projects at a time. Two is good because you can switch off when you’re tired of one project or if you’re waiting on input. But with three or more projects you spend a lot of time task switching.

She said “But …” and I immediately knew what she was thinking. I have a lot more than two projects going on. In fact, I would have to look at my project tracker to know exactly how many projects I have going on right now. How does this reconcile with my statement that two projects is optimal?

Unless you’re doing staff augmentation contracting, consulting work is substantially different from salaried work. For one thing, projects tend to be smaller and better defined.

Also consultants, at least in my experience, spend a lot of time waiting on clients, especially when the clients are lawyers. So you take on more work than you could handle if everyone wanted your attention at once. At least you work up to that if you can. You balance the risk of being overwhelmed against the risk of not having enough work to do.

Working for several clients in a single day is exhausting, but that’s usually not necessary. My ideal is to do work for one or two clients each day, even if I have a lot of clients who are somewhere between initial proposal and final invoice.

The post Simultaneous projects first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/09/22/simultaneous-projects/feed/ 1
Schwarzian derivative https://www.johndcook.com/blog/2020/09/22/schwarzian-derivative/ https://www.johndcook.com/blog/2020/09/22/schwarzian-derivative/#respond Tue, 22 Sep 2020 14:08:15 +0000 https://www.johndcook.com/blog/?p=61389 There are many ways the basic derivative can be generalized: partial derivatives, directional derivatives, covariant derivatives, etc. These all reduce to the basic derivative under the right circumstances. The Schwarzian derivative is not like that. It’s not a generalization of the familiar derivative but rather a differential operator analogous to a derivative. The Schwarzian derivative […]

The post Schwarzian derivative first appeared on John D. Cook.

]]>
There are many ways the basic derivative can be generalized: partial derivatives, directional derivatives, covariant derivatives, etc. These all reduce to the basic derivative under the right circumstances.

The Schwarzian derivative is not like that. It’s not a generalization of the familiar derivative but rather a differential operator analogous to a derivative. The Schwarzian derivative of a function f is defined [1] as

S(f) = \left(\frac{f''}{f'}\right)' - \frac{1}{2} \left(\frac{f''}{f'}\right)^2

To understand the motivation behind such an arbitrary-looking definition, we need to first look at functions of the form

g(z) = \frac{az + b}{cz + d}

called Möbius transformations, or more descriptively, fractional linear transformations [2]. These transformations are very important in complex analysis and have a lot of interesting properties. For example, the image of a circle in the complex plane under a Möbius transformation is another circle [3].

Möbius transformations are to the Schwarzian derivative roughly what constants are to the ordinary derivative. A function is a Möbius transformation if and only if its Schwarzian derivative is zero.

Since the Schwarzian derivative is defined in terms of ordinary derivatives, adding a constant to a function doesn’t change its Schwarzian derivative. Furthermore, the Schwarzian derivative is defined in terms of the ratio of ordinary derivatives, so multiplying a function by a constant doesn’t change its Schwarzian derivative either.

Even more generally, applying a Möbius transformation to a function doesn’t change its Schwarzian derivative. That is, for a fractional linear transformation like g(z) above

S(g \circ f) = S(f)

for any function f. So you can pull Möbius transformations out of a Schwarzian derivative sorta like the way you can pull constants out of an ordinary derivative. The difference though is that instead of the Möbius transformation moving to the outside, it simply disappears.

You can think of the Schwarzian derivative as measuring how well a function can be approximated by a Möbius transformation. Schwarzian derivatives come up frequently in applications of complex analysis, such as conformal mapping.

More kinds of derivatives

[1] The Schwarzian derivative of a constant function is defined to be zero.

[2] Möbius transformations require adbc to not equal zero.

[3] For this to always be true, you have to include a line as a special case of a circle, a circle of infinite radius if you like. If you don’t like that definition, then you can rephrase the statement above as saying Möbius transformations map circles and lines to circles and lines.

The post Schwarzian derivative first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/09/22/schwarzian-derivative/feed/ 0
Searching Greek and Hebrew with regular expressions https://www.johndcook.com/blog/2020/09/20/unicode-regex/ https://www.johndcook.com/blog/2020/09/20/unicode-regex/#comments Sun, 20 Sep 2020 21:13:38 +0000 https://www.johndcook.com/blog/?p=61311 According to the Python Cookbook, “Mixing Unicode and regular expressions is often a good way to make your head explode.” It is thus with fear and trembling that I dip my toe into using Unicode with Greek and Hebrew. I heard recently that there are anomalies in the Hebrew Bible where the final form of […]

The post Searching Greek and Hebrew with regular expressions first appeared on John D. Cook.

]]>
According to the Python Cookbook, “Mixing Unicode and regular expressions is often a good way to make your head explode.” It is thus with fear and trembling that I dip my toe into using Unicode with Greek and Hebrew.

I heard recently that there are anomalies in the Hebrew Bible where the final form of a letter is deliberately used in the middle of a word. That made me think about searching for such anomalies with regular expressions. I’ll come back to that shortly, but I’ll start by looking at Greek where things are a little simpler.

Greek

Only one letter in Greek has a variant form at the end of a word. Sigma is written ς at the end of a word and σ everywhere else. The following Python code shows we can search for sigmas and final sigmas in the first sentence from Matthew’s gospel.

    import re

    matthew = "Κατάλογος του γένους του Ιησού Χριστού, γιου του Δαυείδ, γιου του Αβραάμ"
    matches = re.findall(r"\w+ς\b", matthew)
    print("Words ending in sigma:", matches)

    matches = re.findall(r"\w*σ\w+", matthew)
    print("Words with non-final sigmas:", matches)

This prints

    Words ending in sigma: ['Κατάλογος', 'γένους']
    Words with non-final sigmas: ['Ιησού', 'Χριστού']

This shows that we can use non-ASCII characters in regular expressions, at least if they’re Greek letters, and that the metacharacters \w (word character) and \b (word boundary) work as we would hope.

Hebrew

Now for the motivating example. Isaiah 9:6 contains the word “לםרבה” with the second letter (second from right) being the final form of Mem, ם, sometimes called “closed Mem” because the form of the letter ordinarily used in the middle of a word, מ, is not a closed curve [1].

Here’s code to show we could find this anomaly with regular expressions.

    # There are five Hebrew letters with special final forms.
    finalforms = "\u05da\u05dd\u05df\u05e3\u05e5" # ךםןףץ

    lemarbeh = "\u05dc\u05dd\u05e8\u05d1\u05d4" # לםרבה
    m = re.search(r"\w*[" + finalforms + "\w+", lemarbeh)
    if m:
        print("Anomaly found:", m.group(0))

As far as I know, the instance discussed above is the only one where a final letter appears in the middle of a word. And as far as I know, there are no instances of a non-final form being used at the end of a word. For the next example, we will put non-final forms at the end of words so we can find them.

We’ll need a list of non-final forms. Notice that the Unicode value for each non-final form is 1 greater than the corresponding final form. It’s surprising that the final forms come before the non-final forms in numerical (Unicode) order, but that’s how it is. The same is true in Greek: final sigma is U+03c2 and sigma is U+03c3.

    finalforms = "\u05da\u05dd\u05df\u05e3\u05e5" # ךםןףץ
    nonfinal   = "\u05db\u05de\u05e0\u05e4\u05e6" # כמנפצ

We’ll start by taking the first line of Genesis

    genesis = "בראשית ברא אלהים את השמים ואת הארץ"

and introduce errors by replacing each final form with its corresponding non-final form. The code uses a somewhat obscure feature of Python and is analogous to the shell utility tr.

    genesis_wrong = genesis.translate(str.maketrans(finalforms, nonfinal))

(The string method translate does what you would expect, except that it takes a translation table rather than a pair of strings as its argument. The maketrans method creates a translation table from two strings.)

Now we can find our errors.

    anomalies = re.findall(r"\w+[" + nonfinal + r"]\b", genesis_wrong)
    print("Anomalies:", anomalies)

This produced

    Anomalies: ['אלהימ', 'השמימ', 'הארצ']

Note that Python printed the letters left-to-right.

Vowel points

The text above did not contain vowel points. If the text does contain vowel points, these are treated as letters between the consonants.

For example, the regular expression “בר” will not match against “בְּרֵאשִׁית” because the regex engine sees two characters between ב and ר, the dot (“dagesh”) inside ב and the two dots (“sheva”) underneath ב. But the regular expression “ב..ר” does match against “בְּרֵאשִׁית”.

See footnote [1].

Python

I started this post with a warning from the Python Cookbook that mixing regular expressions and Unicode can cause your head to explode. So far my head intact, but I’ve only tested the waters. I’m sure there are dragons in the deeper end.

I should include the rest of the book’s warning. I used the default library re that ships with Python, but the authors recommend if you’re serious about using regular expressions with Unicode,

… you should consider installing the third-party regex library which provides full support for Unicode case folding, as well as a variety of other interesting features, including approximate matching.

Related posts

[1] When I refer to “בר” as a regular expression, I mean the character sequence ב followed by ר, which your browser will display from right to left because it detects it as characters from a language written from right to left. Written in terms of Unicode code points this would be “\u05d1\u05e8”, the code point for ב followed by the code point for ר.

Similarly, the second regular expression could be written “\u05d1..\u05e8”. The two periods in the middle are just two instances of the regular expression symbol that matches any character. It’s a coincidence that dots (periods) are being used to match dots (the dagesh and the sheva).

The post Searching Greek and Hebrew with regular expressions first appeared on John D. Cook.

]]>
https://www.johndcook.com/blog/2020/09/20/unicode-regex/feed/ 1
Descartes and Toolz https://www.johndcook.com/blog/2020/09/20/descartes-and-toolz/ https://www.johndcook.com/blog/2020/09/20/descartes-and-toolz/#comments Sun, 20 Sep 2020 15:22:47 +0000 https://www.johndcook.com/blog/?p=61298 I was looking recently at the Python module toolz, a collection of convenience functions. A lot of these functions don’t do that much. They don’t save you much code, but they do make your code more readable by making it more declarative. You may not realize need them until you see them. For example, there […]

The post Descartes and Toolz first appeared on John D. Cook.

]]>
I was looking recently at the Python module toolz, a collection of convenience functions. A lot of these functions don’t do that much. They don’t save you much code, but they do make your code more readable by making it more declarative. You may not realize need them until you see them.

For example, there is a function partitionby that breaks up a sequence at the points where a given function’s value changes. I’m pretty sure that function would have improved some code I’ve written recently, making it more declarative than procedural, but I can’t remember what that was.

Although I can’t think of my previous example, I can think of a new one, and that is Descartes’ rule of signs.

Given a polynomial p(x), read the non-zero coefficients in order and keep note of how many times they change sign, either from positive to negative or vice versa. Call that number n. Then the number of positive roots of p(x) either equals n or n minus a positive even number.

For example, suppose

p(x) = 4x4 + 3.1x3x2 – 2x + 6.

The coefficients are 4, 3.1, -1, -2, and 6. The list of coefficients changes signs twice: from positive to negative, and from negative to positive. Here’s a first pass at how you might have Python split the coefficients to look sign changes.

    from toolz import partitionby

    coefficients = [4, 3.1, -1, -2, 6]
    parts = partitionby(lambda x: x > 0, coefficients)
    print([p for p in parts])

This prints

    [(4, 3.1), (-1, -2), (6,)]

The first argument to partitionby an anonymous function that tests whether its argument is positive. When this function changes value, we have a sign alteration. There are three groups of consecutive coefficients that have the same sign, so there are two times the signs change. So our polynomial either has two positive roots or no positive roots. (It turns out there are no positive roots.)

The code above isn’t quite right though, because Descartes said to only look at non-zero coefficients. If we change our anonymous function to

    lambda x: x >= 0

that will work for zeros in the middle of positive coefficients, but it will give a false positive for zeros in the middle of negative coefficients. We can fix the code with a list comprehension. The following example works correctly.

    coefficients = [4, 0, 3.1, -1, 0, -2, 6]
    nonzero = [c for c in coefficients if c != 0]
    parts = partitionby(lambda x: x > 0, nonzero)
    print([p for p in parts])

If our coefficients were in a NumPy array rather than a list, we could remove the zeros more succinctly.

    from numpy import array

    c = array(coefficients)
    parts = partitionby(lambda x: x > 0, c[c != 0])

The function partitionby returns an iterator rather than a list. That’s why we don’t just print parts above. Instead we print [p for p in parts] which makes a list. In applications, it’s often more efficient to have an iterator than a list, generating items if and when they are needed. If you don’t need all the items, you don’t have to generate them all. And even if you do need all the items, you could save memory by not keeping them all in memory at once. I’ll ignore such efficiencies here.

We don’t need the partitions per se, we just need to know how many there are. The example that escapes my mind would have been a better illustration if it needed to do more with each portion than just count it. We could count the number of sign alternations for Descartes rule as follows.

   len([p for p in parts]) - 1

Related posts