One of these days I’m going to figure this out

If something is outside your grasp, it’s hard to know just how far outside it is.

Many times I’ve intended to sit down and understand something thoroughly, and I’ve put it off for years. Maybe it’s a programming language that I just use a few features of, or a book I keep seeing references to. Maybe it’s a theorem that keeps coming up in applications. It’s something I understand enough to get by, but I feel like I’m missing something.

I’ll eventually block off some time to dive into whatever it is, to get to the bottom of things. Then in a fraction of the time I’ve allocated, I do get to the bottom and find out that I wasn’t that far away. It feels like swimming in a water that’s just over your head. Your feet don’t touch bottom, and you don’t try to touch bottom because you don’t know how far away bottom is, but it was only inches away.

A few years ago I wrote about John Conway’s experience along these lines. He made a schedule for the time he’d spend each week working on an open problem in group theory, and then he solved it the first day. More on his story here. I suspect that having allocated a large amount of time to the problem put him in a mindset where he didn’t need a large amount of time.

I’ve written about this before in the context of simplicity and stress reduction: a little simplicity goes a long way. Making something just a little bit simpler can make an enormous difference. Maybe you only reduce the objective complexity by 10%, but you feel like you’ve reduced it by 50%. Just as you can’t tell how far away you are from understanding something when you’re almost there, you also can’t tell how complicated something really is when you’re overwhelmed. If you can simplify things enough to go from being overwhelmed to not being overwhelmed, that makes all the difference.

Typesetting zodiac symbols in LaTeX

Typesetting zodiac symbols in LaTeX is admittedly an unusual thing to do. LaTeX is mostly used for scientific publication, and zodiac symbols are commonly associated with astrology. But occasionally zodiac symbols are used in more respectable contexts.

The wasysym package for LaTeX includes miscellaneous symbols, including zodiac symbols. Here are the symbols, their LaTeX commands, and their corresponding Unicode code points.

The only surprise here is that the command for Capricorn is based on the Latin form of the name: \capricornus.

Each zodiac sign is used to denote a 30° region of the sky. Since the Unicode symbols are consecutive, you can compute the code point of a symbol from the longitude angle θ in degrees:

9800 + \left\lfloor \frac{\theta}{30} \right\rfloor

Here 9800 is the decimal form of 0x2648, and the half brackets are the floor symbol, i.e. round down to the nearest integer.

Here’s the LaTeX code that produced the table.

\documentclass{article}
\usepackage{wasysym}
\begin{document}

\begin{table}
\begin{tabular}{lll}
\aries       & \verb|\aries       | & U+2648 \\
\taurus      & \verb|\taurus      | & U+2649 \\
\gemini      & \verb|\gemini      | & U+264A \\
\cancer      & \verb|\cancer      | & U+264B \\
\leo         & \verb|\leo         | & U+264C \\
\virgo       & \verb|\virgo       | & U+264D \\
\libra       & \verb|\libra       | & U+264E \\
\scorpio     & \verb|\scorpio     | & U+264F \\
\sagittarius & \verb|\sagittarius | & U+2650 \\
\capricornus & \verb|\capricornus | & U+2651 \\
\aquarius    & \verb|\aquarius    | & U+2652 \\
\pisces      & \verb|\pisces      | & U+2653 \\
\end{tabular}
\end{table}
\end{document}

By the way, you can use the Unicode values in HTML by replacing U+ with &#x and adding a semicolon on the end.

Related posts

Airline flight number parity

I read in Wikipedia this morning that there’s a pattern to the parity of flight numbers.

Among airline flight numbers, even numbers typically identify eastbound or northbound flights, and odd numbers typically identify westbound or southbound flights.

I never noticed this. I could see how it might be a useful convention. It would mean that your outbound and in return flight numbers would have opposite parity. That might help keep them straight.

I have coming up, so I wanted to see if the rule holds for my flights. But I’m traveling northwest. The rule couldn’t apply since the north component would say the number should be even, and the west component would say the number should be odd.

Next I looked back on some past trips. When I flew from Houston to Boston a few weeks ago, headed north and east, my flight number was odd, contradicting the rule. But on the way back I flew west from Boston to Chicago, and south from Chicago to Houston. Both of these flight numbers were also odd, in accordance with the rule.

Looking back over all the flights I could see in my account, there was no pattern of parity and flight direction. Most of my flights have been on United because Houston is a United hub. The few flights I could find that weren’t United didn’t seem to follow a parity rule either.

When I first saw the parity rule, I assumed it would have numerous exceptions. It says “typically” and I thought that might mean the rule holds 80% of the time. But just based on my experience, it doesn’t seem to hold at all.

The source that the Wikipedia article cites is a book about Southwest Airlines. Maybe Southwest Airlines follows this rule. I checked a few flights on their web site, and it’s not clear that they follow the rule either. Maybe Southwest used to follow the parity rule.

Incidentally, there is a parity rule for numbering US Interstate highways analogous to the supposed rule for flight numbers. North-south highways are generally odd numbered, and east-west highways are generally even numbered. There are ambiguities and exceptions, but the rule generally holds. It holds more often than the flight rule.

Related posts

Testing Rupert Miller’s suspicion

I was reading Rupert Miller’s book Beyond ANOVA when I ran across this line:

I never use the Kolmogorov-Smirnov test (or one of its cousins) or the χ² test as a preliminary test of normality. … I have a feeling they are more likely to detect irregularities in the middle of the distribution than in the tails.

Rupert wrote these words in 1986 when it would have been difficult to test is hunch. Now it’s easy, and so I wrote up a little simulation to test whether his feeling was justified. I’m sure this has been done before, but it’s easy (now—it would not have been in 1986) and so I wanted to do it myself.

I’ll compare the Kolmogorov-Smirnov test, a popular test for goodness-of-fit, with the Shapiro-Wilks test that Miller preferred. I’ll run each test 10,000 times on non-normal data and count how often each test produces a p-value less than 0.05.

To produce departures from normality in the tails, I’ll look at samples from a Student t distribution. This distribution has one parameter, the number of degrees of freedom. The fewer degrees of freedom, the thicker the tails and so the further from normality in the tails.

Then I’ll look at a mixture of a normal and uniform distribution. This will have thin tails like a normal distribution, but will be flatter in the middle.

If Miller was right, we should expect the Shapiro-Wilks to be more sensitive for fat-tailed t distributions, and the K-S test to be more sensitive for mixtures.

First we import some library functions we’ll need and define our two random sample generators.

from numpy import where
from scipy.stats import *

def mixture(p, size=100):
    u = uniform.rvs(size=size)
    v = uniform.rvs(size=size)
    n = norm.rvs(size=size)
    x = where(u < p, v, n)
    return x

def fat_tail(df, size=100):
    return t.rvs(df, size=size)

Next is the heart of the code. It takes in a sample generator and compares the two tests, Kolmogorov-Smirnov and Shapiro-Wilks, on 10,000 samples of 100 points each. It returns what proportion of the time each test detected the anomaly at the 0.05 level.

def test(generator, parameter):

    ks_count = 0
    sw_count = 0

    N = 10_000
    for _ in range(N):
        x = generator(parameter, 100)

        stat, p = kstest(x, "norm")
        if p < 0.05:
            ks_count += 1
    
        stat, p = shapiro(x)
        if p < 0.05:
            sw_count += 1
    
    return (ks_count/N, sw_count/N)

Finally, we call the test runner with a variety of distributions.

for df in [100, 10, 5, 2]:
    print(test(fat_tail, df))

for p in [0.05, 0.10, 0.15, 0.2]:
    print(test(mixture,p))

Note that the t distribution with 100 degrees of freedom is essentially normal, at least as far as a sample of 100 points can tell, and so we should expect both tests to report a lack of fit around 5% of the time since we’re using 0.05 as our cutoff.

Here’s what we get for the fat-tailed samples.

(0.0483, 0.0554)
(0.0565, 0.2277)
(0.1207, 0.8799)
(0.8718, 1.0000)   

So with 100 degrees of freedom, we do indeed reject the null hypothesis of normality about 5% of the time. As the degrees of freedom decrease, and the fatness of the tails increases, both tests reject the null hypothesis of normality more often. However, in each chase the Shapiro-Wilks test picks up on the non-normality more often than the K-S test, about four times as often with 10 degrees of freedom and about seven times as often with 5 degrees of freedom. So Miller was right about the tails.

Now for the middle. Here’s what we get for mixture distributions.

(0.0731, 0.0677)
(0.1258, 0.1051)
(0.2471, 0.1876)
(0.4067, 0.3041)

We would expect both goodness of fit tests to increase their rejection rates as the mixture probability goes up, i.e. as we sample from the uniform distribution more often. And thatis what we see. But the K-S test outperforms the S-W test each time. Both test have rejection rates that increase with the mixture probability, but the rejection rates increase faster for the K-S test. Miller wins again.

Related posts

Why would anyone do that?

There are tools that I’ve used occasionally for many years that I’ve just started to appreciate lately. “Oh, that’s why they did that.”

When you see something that looks poorly designed, don’t just exclaim “Why would anyone do that?!” but ask sincerely “Why would someone do that?” There’s probably a good reason, or at least there was a good reason at the time.

If something has lasted a long time and has a lot of users, the designers must have done something right. Their choices made sense in some context. Maybe it wasn’t designed for you and your needs. Maybe you’re not using it as intended. Maybe the design made sense at the time but the world has since changed.

Technologists are bad at conveying motivation. We document, or reverse engineer, the what, but the why is much harder to come by. It’s not something people like to talk about.

Related post: Beethoven, Beatles, and Beyoncé: more on the Lindy effect

Predicted distribution of Mersenne primes

Mersenne primes are prime numbers of the form 2p – 1. It turns out that if 2p – 1 is a prime, so is p; the requirement that p is prime is a theorem, not part of the definition.

So far 51 Mersenne primes have discovered [1]. Maybe that’s all there are, but it is conjectured that there are an infinite number Mersenne primes. In fact, it has been conjectured that as x increases, the number of primes px is asymptotically

eγ log x / log 2

where γ is the Euler-Mascheroni constant. For a heuristic derivation of this conjecture, see Conjecture 3.20 in Not Always Buried Deep.

How does the actual number of Mersenne primes compared to the number predicted by the conjecture? We’ll construct a plot below using Python. Note that the conjecture is asymptotic, and so it could make poor predictions for now and still be true for much larger numbers. But it appears to make fairly good predictions over the range where we have discovered Mersenne primes.

Predicted vs actual Mersenne primes

import numpy as np
import matplotlib.pyplot as plt

# p's for which 2^p - 1 is prime.
# See https://oeis.org/A000043
ps = [2, 3, 5, ... , 82589933]

# x has 200 points from 10^1 to 10^8 
# spaced evenly on a logarithmic scale
x = np.logspace(1, 8, 200)

# number of p's less than x such that 2^p - 1 is prime
actual = [np.searchsorted(ps, t) for t in x]

exp_gamma = np.exp(0.5772156649)
predicted = [exp_gamma*np.log2(t) for t in x]

plt.plot(x, actual)
plt.plot(x, predicted, "--")

plt.xscale("log")
plt.xlabel("p")
plt.ylabel(r"Mersenne primes $\leq 2^p-1$")
plt.legend(["actual", "predicted"])

Related posts

[1] Fifty one Mersenne primes have been verified. But these may not be the smallest Mersenne primes. It has not yet been verified that there are no Mersenne primes yet to be discovered between the 47th and 51st known ones. The plot in this post assumes the known Mersenne primes are consecutive, and so it is speculative toward the right end.

Collatz conjecture skepticism

The Collatz conjecture asks whether the following procedure always terminates at 1. Take any positive integer n. If it’s odd, multiply it by 3 and add 1. Otherwise, divide it by 2. For obvious reasons the Collatz conjecture is also known as the 3n + 1 conjecture. It has been computationally verified that the Collatz conjecture is true for n less than 260.

Alex Kontorovich posted a long thread on Twitter last week explaining why he thinks the Collatz conjecture might be false. His post came a few days after Tao’s paper was posted giving new partial results toward proving the Collatz conjecture is true.

Tao cites earlier work by Kontorovich. The two authors don’t contradict each other. Both are interested in the Collatz problem and have produced very technical papers studying the problem from different angles. See Konotorovich’s papers from 2006 and 2009.

Konotorovich’s 2009 paper looks at both the 3n + 1 problem and the 5n + 1 problem, the latter simply replacing the “3” in the Collatz conjecture with a “5”. The 5n + 1 has cycles, such as {13, 33, 83, 208, 104, 52, 26}. It is conjectured that the sequence produced by the 5n + 1 problem diverges for almost all inputs.

Related posts

String interpolation in Python and R

One of the things I liked about Perl was string interpolation. If you use a variable name in a string, the variable will expand to its value. For example, if you a variable $x which equals 42, then the string

    "The answer is $x."

will expand to “The answer is 42.” Perl requires variables to start with sigils, like the $ in front of scalar variables. Sigils are widely considered to be ugly, but they have their benefits. Here, for example, $x is clearly a variable name, whereas x would not be.

You can do something similar to Perl’s string interpolation in Python with so-called f-strings. If you put an f in front of an opening quotation mark, an expression in braces will be replaced with its value.

    >>> x = 42
    >>> f"The answer is {x}."
    'The answer is 42.'

You could also say

    >>> f"The answer is {6*7}."

for example. The f-string is just a string; it’s only printed because we’re working from the Python REPL.

The glue package for R lets you do something very similar to Python’s f-strings.

    > library(glue)
    > x <- 42
    > glue("The answer is {x}.")
    The answer is 42.
    > glue("The answer is {6*7}.")
    The answer is 42.

As with f-strings, glue returns a string. It doesn’t print the string, though the string is displayed because we’re working from the REPL, the R REPL in this case.

Detecting typos with the four color theorem

In my previous post on VIN numbers, I commented that if a check sum has to be one of 11 characters, it cannot detect all possible changes to a string from an alphabet of 33 characters. The number of possible check sum characters must be at least as large as the number of possible characters in the string.

Now suppose you wanted to create a check sum for text typed on a computer keyboard. You want to detect any change where a single key was wrongly typed by using an adjacent key.

You don’t need many characters for the check sum because you’re not trying to detect arbitrary changes, such as typing H for A on a QWERTY keyboard. You’re only trying to detect, for example, if someone typed Q, W, S, or Z for A. In fact you would only need one of five characters for the check sum.

Here’s how to construct the check sum. Think of the keys of the keyboard as a map, say by drawing boundaries through the spaces between the keys. By the four color theorem, you can assign the numbers 0, 1, 2, and 3 to each key so that no two adjacent keys have the same number. Concatenate all these digits and interpret it as a base 4 number. Then take the remainder when the number is divided by 5. That’s your check sum. As proved here, this will detect any typo that hits an adjacent key. It will also detect transpositions of adjacent keys.

Note that this doesn’t assume anything about the particular keyboard. You could have any number of keys, and the keys could have any shape. You could even define “adjacent” in some non-geometrical way as long as your adjacency graph is planar.