Uncategorized

Humming St. Christopher

The other day I woke up with a song in my head I hadn’t heard in a long time, the hymn Beneath the Cross of Jesus. The name of the tune is St. Christopher.

When I thought about the tune, I realized it has some fairly sophisticated harmony. My memory of the hymns I grew up with was that they were harmonically simple, mostly built around three chords: I, IV, V. But this hymn has a lot going on.

I imagine a lot of things that I remember as being simple weren’t. I was simple, and my world was richer than I realized.

***

You can find the sheet music for the hymn here. I’ll write out the chord progressions for the first two lines.

    I     idim | I            | V7   ii7 V7  | I   III | 
    vi   iidim | vi VI7 ii vi | II   VIII7♭5 | III     |

If you’re not familiar with music theory, just appreciate that there are a lot more symbols up there than I, IV, and V.

The second line effectively modulates into a new key, the relative minor of the original key, and I’m not sure how to describe what’s going on at the end of the second line.

Related posts

New ICD-10 diagnosis code weirdness

In October 2020 there were 619 new codes added to the list of ICD-10 diagnosis codes. Some of these make sense, such as four codes added for COVID:

  • U07.1 COVID-19
  • Z11.52 Encounter for screening for COVID-19
  • Z20.822 Contact with and (suspected) exposure to COVID-19
  • Z86.16 Personal history of COVID-19

There are also 24 new codes related to fentanyl. I could see that.

But as I’ve written before some ICD-10 codes are bizarrely specific.

There are only so many ICD-10 codes, around 55,000 at the moment, and so all these hyper-specific codes mean there are fewer codes to devote to refining more common diagnoses.

In the new codes, 4 are related to COVID-19 and 170 are related to pedestrian injuries while standing on an electric scooter or other “micro-mobility pedestrian conveyance.” There are 170 shades of “I hurt myself on a scooter.”

Thanks to the new codes, medical practices can now distinguish, for example, between “Pedestrian on standing electric scooter injured in collision with pedal cycle in nontraffic accident” (V01.031) and “Pedestrian on other standing micro-mobility pedestrian conveyance injured in collision with pedal cycle in nontraffic accident” (V01.038).

Of all the codes introduced in 2020, 27% describe variations on a scooter injury.

More diagnosis code posts

Hashing phone numbers

A cryptographic hash is also known as a one-way function because given an input x, one can quickly compute the hash h(x), but it is extremely time-consuming to try to recover x if you only know h(x).

Even if the hashing algorithm is considered “broken,” it may take an enormous effort to break it. Google demonstrated that they could break a SHA-1 hash, but they used a GPU-century of compute power to do so. Attacks have become more efficient since then, but it still takes many orders of magnitude less work to compute a hash than to attempt to invert it. [1]

However, if you know that the hash value comes from a small set of possible inputs, brute force can discover which one. I wrote about this a couple years ago in the post Hashing names does not protect privacy. You could, for example, easily create a table of the hashes of all nine-digit social security numbers.

I often explain this to clients who have been told that hashed data is “encrypted.” This is subtle, because the data is encrypted, in a way, but not in the way they think.

A paper came out a few weeks ago that hashed 118 billion phone numbers.

The limited amount of possible mobile phone numbers combined with the rapid increase in affordable storage capacity makes it feasible to create key-value databases of phone numbers indexed by their hashes and then to perform constant-time lookups for each given hash value. We demonstrate this by using a high-performance cluster to create an in-memory database of all 118 billion possible mobile phone numbers from [reference] (i.e., mobile phone numbers allowed by Google’s libphonenumber and the WhatsApp registration API) paired with their SHA-1 hashes.

The authors were able to use this data base to query

10% of US mobile phone numbers for WhatsApp and 100% for Signal. For Telegram we find that its API exposes a wide range of sensitive information, even about numbers not registered with the service.

Related posts

[1] Hash functions are not invertible, even in theory, in the sense of a unique x leading to a hash value h(x). Suppose you’re computing a 256-bit hash on files that are one kilobyte (8192 bits). If you’re mapping a space of 28192 possible files into a space of 2256 possible hash values, the mapping cannot be one-to-one. However, if you know the inputs are not random bits but German prose, and you find a file of German prose that has a matching hash value, you’ve almost certainly recovered the file that led to the hash value.

The time-traveling professor

Suppose you were a contemporary professor sent back in time. You need to continue your academic career, but you can’t let anyone know that you’re from the future. You can’t take anything material with you but you retain your memories.

At first thought you might think you could become a superstar, like the musician in the movie Yesterday who takes credit for songs by The Beatles. But on second thought, maybe not.

Maybe you’re a physicist and you go back in time before relativity. If you go back far enough, people will not be familiar with the problems that relativity solves, and your paper on relativity might not be accepted for publication. And you can’t just post it on arXiv.

If you know about technology that will be developed, but you don’t know how to get there using what’s available in the past, it might not do you much good. Someone with a better knowledge of the time’s technology might have the advantage.

If you’re a mathematician and you go back in time 50 years, could you scoop Andrew Wiles on the proof of Fermat’s Last Theorem? Almost certainly not. You have the slight advantage of knowing the theorem is true, whereas your colleagues only strongly suspect that it’s true. But what about the proof?

If I were in that position, I might think “There’s something about a Seaborn group? No, that’s the Python library. Selmer group? That sounds right, but maybe I’m confusing it with the musical instrument maker. What is this Selmer group anyway, and how does it let you prove FLT?”

Could Andrew Wiles himself reproduce the proof of FTL without access to anything written after 1971? Maybe, but I’m not sure.

In your area of specialization, you might be able to remember enough details of proven results to have a big advantage over your new peers. But your specialization might be in an area that hasn’t been invented yet, and you might know next to nothing about areas of research that are currently in vogue. Maybe you’re a whiz at homological algebra, but that might not be very useful in a time when everyone is publishing papers on differential equations and special functions.

There are a few areas where a time traveler could make a big splash, areas that could easily have been developed much sooner but weren’t. For example, the idea of taking the output of a function and sticking it back in, over and over, is really simple. But nobody looked into it deeply until around 50 years ago.

Then came the Mandelbrot set, Feigenbaum’s constants, period three implies chaos, and all that. A lack of computer hardware would be frustrating, but not insurmountable. Because computers have shown you what phenomena to look for, you could go back and reproduce them by hand.

In most areas, I suspect knowledge of the future wouldn’t be an enormous advantage. It would obviously be some advantage. Knowing which way a conjecture was settled tells you whether to try to prove or disprove it. In the language of microprocessors, it gives you good branch prediction. And it would help to have patterns in your head that have turned out to be useful, even if you couldn’t directly appeal to these patterns without blowing your cover. But I suspect it might make you something like 50% more productive, not enough to turn an average researcher into a superstar.

Gell-Mann amnesia and its opposite

Michael Crichton coined the term Gell-Mann Amnesia effect to describe forgetting how unreliable a source is in one area when you trust it in another area. In Crichton’s words:

Briefly stated, the Gell-Mann Amnesia effect is as follows. You open the newspaper to an article on some subject you know well. In Murray [Gell-Mann]’s case, physics. In mine, show business. You read the article and see the journalist has absolutely no understanding of either the facts or the issues. Often, the article is so wrong it actually presents the story backward—reversing cause and effect. I call these the “wet streets cause rain” stories. Paper’s full of them.

In any case, you read with exasperation or amusement the multiple errors in a story, and then turn the page to national or international affairs, and read as if the rest of the newspaper was somehow more accurate about Palestine than the baloney you just read. You turn the page, and forget what you know.

I think about the Gell-Mann Amnesia effect when I read news stories that totally botch science or statistics. Most of the time when I read a news story that touches on something I happen to know about, it’s at best misleading and at worst just plain wrong.

Yesterday I had the opposite experience. I was trying out a new podcast, not one focused on science or statistics, that was mostly correct when it touched on statistical matters that I’ve looked into. They didn’t bat 1000, but they did better than popular news sites. That increased my estimate of how likely the podcast is to be accurate about other matters.

By the way, why is the effect named after the Nobel Prize-winning physicist Murray Gell-Mann? Crichton explained

I refer to it by this name because I once discussed it with Murray Gell-Mann, and by dropping a famous name I imply greater importance to myself, and to the effect, than it would otherwise have.

Accessing characters by name

snowman U+2503

You can sometimes make code more readable by using names for characters rather than the characters themselves or their code points. Python and Perl both, for example, let you refer to a character by using its standard Unicode name inside \N{}.

For instance, \N{SNOWMAN} refers to Unicode character U+2603, shown at the top of the post. It’s also kinda hard to read ☃, and not many people would read \u2603 and immediately think “Ah yes, U+2603, the snowman.”

A few days ago I wrote about how to get one-liners to work on Windows and Linux. Shells and programming languages have different ways to quoting and escaping special characters, and sometimes these ways interfere with each other.

I said that one way to get around problems with literal quotes inside a quoted string is to use character codes for quotes. This may be overkill, but it works. For example,

    perl -e 'print qq{\x27hello\x27\n}'

and

    python -c "print('\x27hello\x27\n')"

both print 'hello', including the single quotes.

One problem with this is that you may not remember that U+0027 is a single quote. And even if you have that code point memorized [2], someone else reading your code might not.

The official Unicode name for a single quote is APOSTROPHE. So the Python one-liner above could be written

    python -c "print('\N{APOSTROPHE}hello\N{APOSTROPHE}\n')"

This is kinda artificial in a one-liner because such tiny programs optimize for brevity rather than readability. But in an ordinary program rather than on the command line, using character names could make code easier to read.

So how do you find out the name of a Unicode character? The names are standard, independent of any programming language, so you can look them up in any Unicode reference.

A programming language that lets you use Unicode names probably also has a way to let you look up Unicode names. For example, in Python you can use unicodedata.name.

    >>> from unicodedata import name
    >>> name('π')
    'GREEK SMALL LETTER PI'
    >>> name("\u05d0") # א
    >>> 'HEBREW LETTER ALEF'

In Perl you could write

    use charnames q{ :full };
    print charnames::viacode(0x22b4); # ⊴

which prints “NORMAL SUBGROUP OF OR EQUAL TO” illustrating that Unicode names can be quite long.

Related posts

[1] How this renders varies greatly from platform to platform. Here are some examples.

Windows with Firefox:

iPad with Firefox:

iPad with Inoreader:

[2] Who memorizes Unicode code points?! Well, I’ve memorized a few landmarks. For example, I memorized where the first letters of the Latin, Greek, and Hebrew alphabets are, so in a pinch I can figure out the rest of the letters.

Pythagorean dates

The numbers that make up today’s date—12, 16, and 20—form a Pythagorean triple. That is, 12² + 16² = 20².

There won’t be any such dates next year. You could confirm this by brute force since there are only 365 days in 2021, but I hope to do something a little more interesting here.

The Mathematica function

    PowersRepresentations[n, k, p]

lists the ways to represent a number n as the sum of k pth powers. Calling PowersRepresentations[21^2, 2, 2] shows that there is no non-trivial way to write 21² as the sum of two powers; the only way is 0² + 21². You might try using the whole year rather than the last two digits, but there’s no non-trivial way to write 2021 as the sum of two powers either.

The paragraph above says that no Pythagorean triangle can have hypotenuse 21 or 2021. But could there be a Pythagorean date next year with 21 on one of the sides?

If the components of a date correspond to sides of a Pythagorean triangle, and one of the sides is 21, then the day must be on the hypotenuse because all the month numbers are less than 21, and that day must be greater than 21.

Let’s look for Pythagorean triangles with a side equal to 22, 23, …, 31 and see whether any of them have a 21 on the side.

The Mathematica command

    For[d = 22, d <= 31, d++, Print[PowersRepresentations[d*d, 2, 2]]]

returns

    {{0,22}}
    {{0,23}}
    {{0,24}}
    {{0,25},{7,24},{15,20}}
    {{0,26},{10,24}}
    {{0,27}}
    {{0,28}}
    {{0,29},{20,21}}
    {{0,30},{18,24}}
    {{0,31}}

This tells us that the only numbers from 22 to 31 that can be on the side of a Pythagorean triangle are 25, 26, 29, and 30.

The number 21 does appear in the list: (20, 21, 29) is a Pythagorean triple. But if the 29th of a month were a Pythagorean date next year, it would have to be the 20th month. Since we only have 12 months, there won’t be any Pythagorean dates next year.

The number 22, 23, and 24 can’t be written as sums of squares in a non-trivial way, so if there’s a Pythagorean date in the next few years it will have to be with the day of the month on the side. We can see from the output above that (7, 24, 25) is a Pythagorean triple, so the next Pythagorean date is 7/25/24, and the next after that is 7/24/25. And the next after that will be 10/24/26.

Incidentally, there is an algorithm for counting how many ways a number can be written as the sum of k squares, and it is implemented in Mathematica as Squares[k, n]. If you run SquaresR[2, 441] it will tell you that there are four ways to write 441 as the sum of two squares. But all four of these are trivial: (0, 21), (21, 0), (0, -21), and (-21, 0). Because every square can be written as the sum of two squares analogously, a return value of 4 means there are no non-trivial solutions.

I think I’ll pass

The other day I saw an article about some math test and thought “I bet I’d blow that away now.”

Anyone who has spent a career using some skill ought to blow away an exam intended for people who have been learning that skill for a semester.

However, after thinking about it more, I’m pretty sure I’d pass the test in question, but I’m not at all sure I’d ace it. Academic exams often test unimportant material that is in the short term memory of both the instructor and the students.

From Timbuktu to …

When I was in middle school, I remember a question that read

It is a long way from ________ to ________.

I made up two locations that were far apart but my answer was graded as wrong.

My teacher was looking for a direct quote from a photo caption in our textbook that said it was a long way from Timbuktu to some place I can’t remember.

That stuck in my mind as the canonical example of a question that doesn’t test subject matter knowledge but tests the incidental minutia of the course itself [1]. A geography professor would stand no better chance of giving the expected answer than I did.

The three reasons …

Almost any time you see a question asking for “the 3 reasons” for something or “the 5 consequences” of this or that, it’s likely a Timbuktu question. In open-world contexts [2], I’m suspicious whenever I see “the” followed by a specific number.

In some contexts you can make exhaustive lists—it makes sense to talk about the 3 branches of the US government or the 5 Platonic solids, but it doesn’t make sense to talk about the 4 causes of World War I. Surely historians could come up with more than 4 causes, and there’s probably no consensus regarding what the 4 most important causes are.

There’s a phrase teaching to the test for when the goal is not to teach the subject per se but to prepare the students to pass a standardized test related to the subject. The phenomena discussed here is sort of the opposite, testing to the teaching.

When you ask students for the 4 causes of WWI, you’re asking for the 4 causes given in lecture or the 4 causes in the text book. You’re not testing knowledge of WWI per se but knowledge of the course materials.

Related posts

[1] Now that I’m in middle age rather than middle school, I could say that the real question was not geography but psychology. The task was to reverse-engineer from an ambiguous question what someone was thinking. That is an extremely valuable skill, but not one I possessed in middle school.

[2] A closed world is one in which the rules are explicitly known, finite, and exhaustive. Chess is a closed world. Sales is not. Academia often puts a box around some part of an open world so it can think of it as a closed world.