Stability of a superegg

Posted on 22 October 2023 by John

Three weeks ago I wrote about supereggs, a shape popularized by Piet Hein.

Brass superegg by Piet Hein

One aspect of supereggs that I did not address is their stability. Looking at the photo above, you could imagine that if you gave the object a slight nudge it would not fall over. Your intuition would be right: supereggs are stable.

More interesting than asking whether supereggs are stable is to ask how stable they are. An object is stable if for some ε > 0, a perturbation of size less than ε will not cause the object to fall over.

All supereggs are stable, but the degree of stability decreases as the ratio of height to width increases: the taller the superegg, the easier it is to knock over.

How can we quantify stability? An object is stable if its center of curvature, measured at the point of contact with a horizontal surface, is above its center of mass.

For a sphere, the center of curvature is exactly the center of mass. If we modify the sphere to make it slightly flatter at the point of contact, the center of curvature will move above the center of mass and the modified sphere will be stable. On the other hand, if we made the sphere slightly more curved at the point of contact, the center of curvature would move below the center of mass and the object would be unstable.

The center of curvature is the center of the sphere that best approximates a surface at a point. If our object is a sphere, the hypothetical sphere defining the curvature is the object itself. For an object flatter on bottom than a sphere, the sphere defining center of curvature is larger than the object. The flatter the object, the larger the approximating sphere.

The superegg has zero curvature at the bottom, and so its center of curvature is at infinity. But if you push the superegg slightly, the part touching the horizontal surface is no longer the exact center, the curvature is slightly positive, and so the radius of curvature is finite. The center of mass also moves up slightly as the superegg rocks to the side.

So the center of curvature moves down and the center of mass moves up. At what point do they cross and the object becomes unstable?

Solution outline

The equation of the surface of the superegg is

$\left(\sqrt{x^2 + y^2}\right)^p + |z/h|^p = 1$

where p > 2 (a common choice is p = 5/2) and h > 1 (a common choice is h = 4/3).

If we tilt the superegg so that its axis of symmetry now makes a small angle θ with the z-axis, we have to answer several questions.

Where does the center of mass go?
What is the new point of contact with the horizontal surface?
Where is the center of curvature?
Is the center of curvature above or below the center of mass?

It’s easier to imagine lifting the superegg up from the surface, rotating it in the air, then lowering it to the surface. That way you don’t have to describe how the superegg rocks.

The superegg is radially symmetric about the vertical axis, so without loss of generality you can imagine the problem is limited to the xz plane.

It’s messy to work out the details, but in principle you could work out how much the superegg can be perturbed and return to its original position. You know a priori that the result will be a decreasing function of h and p.

[1] Photo by Malene Thyssen, licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license.

Best-of-five versus Best-of-seven

Posted on 22 October 2023 by John

Suppose that when Team X and Team Y play, the probability that X will win a single game is p and the probability that Y will win is q = 1 − p.

What is the probability that X will win the majority of a series of N games for some odd number N?

We know intuitively that the team more likely to win each game is more likely to win the series. But how much more likely?

We also know intuitively that the more games in the series, the more likely the better team will win. But how much more likely?

The probability of X winning a series of N games is

$\sum_{n > N-n} \binom{N}{n} p^n q^{N-n}$

It doesn’t matter that maybe not all games are played: some games may not be played precisely because it makes no difference whether they are played. For example, if one team wins the first three in a best-of-five series, the last two games are not played, but for our purposes we may imagine that they were played.

Let’s plot the probability of X winning a best-of-five series and a best-of-seven series.

The better team is more likely to win a series than a game. The probability of the better team winning a single game would be a diagonal line running from (0, 0) to (1, 1). The curves for a series are below this line to the left of 0.5 and above this line to the right of 0.5. But the difference between a best-of-five and a best-of-seven series is small.

Here’s another plot looking at the difference in probabilities, probability of winning best-of-seven minus probability of winning best-of-five.

The maximum difference is between 3% and 4%.

This assumes the probability of winning a game is constant and that games are independent. This is not exactly the case in, for example, the World Series in which human factors make things more complicated.

The 19th rule of HIPAA Safe Harbor

Posted on 20 October 2023 by John

The HIPAA Safe Harbor provision says that data can be considered deidentified if 18 kinds of data are removed or reported at low resolution. At the end of the list of 18 items, there is an extra category, sometimes informally called the 19th rule:

The covered entity does not have actual knowledge that the information could be used alone or in combination with other information to identify an individual who is a subject of the information.

So if you otherwise meet the letter of the Safe Harbor provision, but you know (or should know) that the data can still be used to identify people represented in the data, then Safe Harbor does not apply.

The Department of Health and Human Services guidance document gives four examples of “when a covered entity would fail to meet the ‘actual knowledge’ provision.” The first concerns a medical record that would reveal someone’s identity by revealing their profession.

Revealing that someone is a plumber would probably not be a privacy risk, but in the HHS example someone’s occupation was listed as a former state university president. If you know what state this person is in, that greatly narrows down the list possibilities. One more detail, such as age, might be enough to uniquely identify this person.

Free text fields, such as physician notes, could easily contain this kind of information. Software that removes obvious names won’t catch this kind of privacy leak.

Not only are intentional free text fields a problem, so are unintentional free text fields. For example, a database field labeled CASENOTES is probably intended to contain free text. But other text fields, particularly if they are wider than necessary to contain the anticipated data, could contain identifiable information.

If you have data that does not fall under the Safe Harbor provision, or if you are not sure the Safe Harbor rules are enough to insure that the data are actually deidentified, let’s talk.

This post is not legal advice. My clients are often lawyers, but I am not a lawyer.

Bluesky

Posted on 18 October 2023 by John

I saw a comment from Christos Argyropoulos on Twitter implying that there’s a good scientific community on Bluesky, so I went there and looked around a little bit. I have account, but I haven’t done much with it. I was surprised that a fair number of people had followed me on Bluesky even though I only posted twice. I posted a couple links this evening, doubling my total activity on Bluesky.

I don’t know what I’ll do with my Bluesky or Mastodon accounts. I certainly will not try to replicate what I built on Twitter. So far I’m more of a reader than a writer on Bluesky and Mastodon. Bluesky is not a science-focused social network, but I may use it for that, only following science-oriented accounts there. We’ll see.

You can always find me here, whether or not you can find me on Bluesky, Mastodon, or Twitter. You can subscribe to this site to get notifications of new posts via RSS or email, and I also have a monthly newsletter where I post blog highlights.

Portable sed -i across MacOS and Linux

Posted on 18 October 2023 by John

The -i flag to ask sed to edit a file in place works differently on Linux and MacOS. If you want to create a backup of your file before you edit it, say with the extension .bak, then on Linux you would run

    sed -i.bak myfile

but for the version of sed that ships with MacOS you would write

    sed -i '.bak' myfile

Note that this changes how sed interprets its arguments, whether you want a backup file or not. You must specify a backup file extension, but could specify the extension as '', which effectively means don’t make a backup.

The difference between how the two versions of sed handle their arguments is a minor nuisance when working interactively at the command line, but a script calling sed -i will not work the same on Mac and Linux.

I put the line

   alias sed=gsed

in my .zshrc file so that when I type sed the shell will actually run the Gnu version of sed, which handles -i as I expect.

But this does not work in bash scripts. I tried putting the alias in my .bashrc and .bash_profile files, but that doesn’t work. In scripts bash ignores aliases, no matter what config file you put them in. Here’s the relevant line from the bash man page:

Aliases are not expanded when the shell is not interactive, unless the expand_aliases shell option is set using shopt.

So the solution is to put the magic incantation

   shopt -s expand_aliases

at the top of your script.

Here’s how I wrote a script using sed -i works the same way on MacOS and Linux.

    #!/usr/bin/env bash
    shopt -s expand_aliases

    if [[ "$OSTYPE" == "darwin"* ]]; then
        alias sed=gsed
    fi

    sed -i ...

This may seem a little heavy-handed, changing the program that I’m using just to fix a problem with the arguments. I could, for example, have the script insert '' after -i when running on MacOS.

But I’m not changing the program I use. Quite the opposite. The code above saves me from having to use a different program. It insures that I’m running the Gnu version of sed on both platforms. There are other differences between the two versions of sed. I don’t know what they are off hand, but they could lead to frustrating bugs in the future, and the code above fixes these bugs before they occur.

Nearest, easiest, and most accessible

Posted on 16 October 2023 by John

From Love What Lasts, Joshua Gibbs:

… there are too many things in the world to care equally about them all. The sheer volume of things … demands that we have hierarchical standards by which to judge their value, or else we are condemned to give our lives over entirely to what is nearest, easiest, and most accessible.

Johnson circle theorem

Posted on 15 October 2023 by John

Draw three circles of radius r that intersect at a single point. Then draw a triangle connecting the remaining three points of intersection.

(Each pair of circles intersects in two points, one of which is the point where all three circles intersect, so there are three other intersection points.)

Then the circumcircle of the triangle, the circle through the three vertices, also has radius r.

I’ve seen this theorem referred to as Johnson’s theorem, as well as the Johnson–Tzitzeica or Tzitzeica-Johnson theorem. Apparently Roger Johnson and George Tzitzeica (Gheorghe Țițeica) both proved the same theorem around the same time. Johnson’s publication [1] dates to 1916.

It’s remarkable that a theorem in Euclidean geometry this easy to state was discovered 2200 years after Euclid. Johnson says in [1]

Singularly enough, this remarkable theorem appears to be new. A rather cursory search in several of the treatises on modern elementary geometry fails to disclose it, and the author has not yet found any person to whom it was known. On the other hand, the figure is so simple … that it seems almost out of the question that the fact can have escaped detection. Even if geometers have overlooked it, someone must have noticed it in casually drawing circles. But if this were the case, it seems like a theorem of sufficient interest to receive some prominence in the literature, and therefore ought to be well known.

[1] Roger Johnson. A circle theorem. The American Mathematical Monthly, May, 1916, Vol. 23, No. 5, pp. 161-162.

Newton line

Posted on 14 October 2023 by John

Let Q be a convex quadrilateral with at most two parallel sides. Draw the two diagonals then draw a line through their midpoints. This line is called the Newton line.

(The requirement that at most two sides are parallel insures that the midpoints are distinct and so there is a unique line joining them.)

In the figure above, the diagonals are blue, their midpoints are indicated by black dots, and the red line joining them is the Newton line.

Now join the midpoints of the sides. These are draw with dotted gray lines above. Then the intersection of these two lines lies on the Newton line.

Now suppose further that our quadrilateral is a tangential quadrilateral, i.e. that all four sides are tangent to a circle C. Then the center of C also lies on the Newton line.

In the image above, it appears that the lines joining the midpoints of the sides also passes intersect at the center of the circle. That’s not true in general, and its not true in the example above but you’d have to zoom in to see it. But it is true that the intersection of these lines and the center of the circle both lie on the Newton line.

Homework problems are rigged

Posted on 12 October 2023 by John

This post is a follow-on to a discussion that started on Twitter yesterday. This tweet must have resonated with a lot of people because it’s had over 250,000 views so far.

You almost have to study advanced math to solve basic math problems. Sometimes a high school student can solve a real world problem that only requires high school math, but usually not.

There are many reasons for this. For one thing, formulating problems is a higher-level skill than solving them. Homework problems have been formulated for you. They have also been rigged to avoid complications. This is true at all levels, from elementary school to graduate school.

A college school student tutoring a high school student might notice that homework problems have been crafted to always have whole number solutions. The college student might not realize how his own homework problems have been rigged analogously. Calculus homework problems won’t avoid fractions, but they still avoid problems that don’t have tidy solutions [1].

When I taught calculus, I looked around for homework problems that were realistic applications, had closed-form solutions, and could be worked in a reasonable amount of time. There aren’t many. And the few problems that approximately satisfy these three criteria will be duplicated across many textbooks. I remember, for example, finding a problem involving calculating the mass of a star that I thought was good exercise. Then as I looked through a stack of calculus texts I saw that the same homework problem was in most if not all the textbooks.

But it doesn’t stop there. In graduate school, homework problems are still crafted to avoid difficulties. When you see a problem like this one it’s not obvious that the problem has been rigged because the solution is complicated. It may seem that you’re able to solve the problem because of the power of the techniques used, but that’s not the whole story. Tweak any of the coefficients and things may go from complicated to impossible.

It takes advanced math to solve basic math problems that haven’t been rigged, or to know how to do your own rigging. By doing your own rigging, I mean looking for justifiable ways to change the problem you need to solve, i.e. to make good approximations.

For example, a freshman physics class will derive the equation of a pendulum as

y″ + sin(y) = 0

but then approximate sin(y) as y, changing the equation to

y″ + y = 0.

That makes things much easier, but is it justifiable? Why is that OK? When is that OK, because it’s not always.

The approximations made in a freshman physics class cannot be critiqued using freshman physics. Working with the un-rigged problem, i.e. keeping the sin(y) term, and understanding when you don’t have to, are both beyond the scope of a freshman course.

Why can we ignore friction in problem 5 but not in problem 12? Why can we ignore the mass of the pulley in problem 14 but not in problem 21? These are questions that come up in a freshman class, but they’re not freshman-level questions.

***

[1] This can be misleading. Students often say “My answer is complicated; I must have made a mistake.” This is a false statement about mathematics, but it’s a true statement about pedagogy. Problems that haven’t been rigged to have simple solutions often have complicated solutions. But since homework problems are usually rigged, it is true that a complicated result is reason to suspect an error.

Python code for means

Posted on 11 October 2023 by John

The last couple article have looked at various kinds of mean. The Python code for four of these means is trivial:

gm  = lambda a, b: (a*b)**0.5
am  = lambda a, b: (a + b)/2
hm  = lambda a, b: 2*a*b/(a+b)
chm = lambda a, b: (a**2 + b**2)/(a + b)

But the arithmetic-geometric mean (AGM) is not trivial:

from numpy import pi
from scipy.special import ellipk

agm = lambda a, b: 0.25*pi*(a + b)/ellipk((a - b)**2/(a + b)**2)

The arithmetic-geometric mean is defined by iterating the arithmetic and geometric means and taking the limit. This iteration converges very quickly, and so writing code that directly implements the definition is efficient.

But the AGM can also be computed via a special function K, the “complete elliptic integral of the first kind,” which makes the code above more compact. This is conceptually nice because we can think of the AGM as a simple function, not an iterative process.

But how is K evaluated? In some sense it doesn’t matter: it’s encapsulated in the SciPy library. But someone has to write SciPy. I haven’t looked at the SciPy source code, but usually K is calculated numerically using the AGM because, as we said above, the AGM converges very quickly.

Bell curve meme: How to calculate the AGM? The left and right tails say to use a while loop. The middle says to evaluate a complete ellliptic integral of the first kind.

This fits the pattern of a bell curve meme: the novice and expert approaches are the same, but for different reasons. The novice uses an iterative approach because that directly implements the definition. The expert knows about the elliptic integral, but also knows that the iterative approach suggested by the definition is remarkably efficient and eliminates the need to import a library.

Although it’s easy to implement the AGM with a while loop, the code above does have some advantages. For one thing, it pushes the responsibility for validation and exception handling onto the library. On the other hand, the code is easy to get wrong because there are two conventions on how to parameterize K and you have to be sure to use the same one your library uses.

Month: October 2023

Stability of a superegg

Solution outline

Related posts

Best-of-five versus Best-of-seven

Related posts

The 19th rule of HIPAA Safe Harbor

Related posts

Bluesky

Portable sed -i across MacOS and Linux

Nearest, easiest, and most accessible

Johnson circle theorem

Related posts

Newton line

Related posts

Homework problems are rigged

Python code for means