Why is it defined that way?

There are numerous conventions in mathematics that student continually question.

• Why isn’t 1 a prime number?
• Why is 0! defined to be 1?
• Why is an empty sum 0 and an empty product 1?
• Why can’t you just say 1/0 = ∞?
• Etc.

There are good reasons for the existing conventions, and they usually boil down to this: On the whole, theorems are more simply stated with these conventions than with the alternatives. For example, if you defined 0! to be some other value, say 0, then there would be countless theorems that would have to be amended with language of the form “… except when n is zero, in which case …”

In short, the existing conventions simplify things more than they complicate them. But that doesn’t mean that everything is simpler under the standard conventions. The next post gives an example along these lines.

Related posts

I recently needed a word for “multiply by 13” that was parallel to quadruple for “multiply by 4”, so I made up triskadekaduple by analogy with triskadecaphobia. That got me to wondering how you make words for multiples higher than four.

The best answer is probably “don’t.” Your chances of being understood drop sharply after quadruple. But despite the warning sign saying “hic sunt dracones” we forge ahead.

Double, triple, and quadruple are based on Latin names for numbers, so we should keep using Latin prefixes. Next would be quintuple, but I expect you would likely be understood if you said pentuple based on Greek penta-.

Next would be sextuple, septuple, and octuple. These terms are understandable, particularly in the context of multiple births: sextuplets, septuplets, and octuplets.

But now we hit a brick wall. The Latin prefix for nine is novem-, and it’s unlikely anyone would understand novemple or anything like that. The Greek prefix ennea– is no better. Enneauple? Enneaduple?

(The Latin prefix novem– is recognizable from November, which is the 11th month, so does that mean novem– stands for 11? No, November really is the ninth month, or at least it was when the year started in March.

The only example I can think of for a word starting with ennea– is the enneagram personality classification system.)

The prefixes after novem– are equally obscure. But if we jump to 13, some people will have heard of triskadecaphobia. This comes from tris kai deka (three and ten) from Greek. But I would only use triskadecaduple tongue-in-cheek.

Dunbar’s number and C. S. Lewis

Robin Dunbar proposed that humans are capable of maintaining social relationships with about 150 people. At first this number may seem too small, especially for someone with a thousand “friends” on social media. But if you raise the bar a little on who you consider a friend, 150 may seem too large.

A couple examples given in support of Dunbar’s number are that you might have around 150 people at a funeral, or maybe 300 at a wedding. Of course there’s variance around these numbers. Some people may have a personal Dunbar number of 300, but probably not 3,000.

I suspect there’s something like a conservation law for friendship. We only have so much emotional capacity and time, but we can choose how we concentrate or disperse these limited resources.

I recently started rereading Surprised by Joy, a sort of autobiography of C. S. Lewis. A section I read this morning made me think about Dunbar’s number.

While friendship has been by far the chief source of my happiness, acquaintance or general society has always meant little to me, and I cannot quite understand why a man should wish to know more people than he can make real friends of.

I imagine most of us would do well to focus more on quality than quantity when it comes to friendships (and a great many other things as well).

Morse code palindromes

A palindrome is a word or sentence that remains the same when its characters are reversed. For example, the word “radar” is a palindrome, as is the sentence “Madam, I’m Adam.”

I was thinking today about Morse code palindromes, sequences of Morse code that remain the same when reversed.

This post will look at what it means for a letter or a word to be a palindrome in Morse code, then look at palindrome sentences in Morse code, then finally look at a shell script to find Morse palindromes.

Letters and words

Some individual letters are palindromes in Morse code, such as I (..) and P (.--.).

Some letters change into other letters when their Morse code representation is reversed. For example B (-...) becomes V (...-) and vice versa.

The letters C (-.-.), J (.---), and Z (--..) when reversed are no longer part of the 26-letter Roman alphabet, though the reversed sequences are sometimes used for vowels with umlauts: Ä (.-.-), Ö (---.), and Ü (..--).

The sequence SOS (... --- ...) is a palindrome in English and in Morse code. But some words are palindromes in Morse code that are not palindromes in English, such as “gnaw,” which is

    --. -. .- .--

in Morse code.

The longest word I’ve found which is a palindrome in Morse code is “footstool.”

    ..-. --- --- - ... - --- --- .-..

Sentences

I wrote some code to search a dictionary and make a list of English words that remain English words when converted to Morse code, reversed, and turned back into text. There aren’t that many, around 240. Then I looked for ways to make sentences out of these words.

For example, “Trevor sees Robert” is a palindrome in Morse code:

    - .-. . ...- --- .-. ... . . ... .-. --- -... . .-. -

If you’d like to try your hand at this, you might find a couple files useful. This file gives a list of words that remain the same when their Morse code is reversed, such as “outdo” (--- ..- - -.. ---) and this file gives a list of transformation pairs, such as “sail” (... .- .. .-..) and “fins” (..-. .. -. ...).

Shell scripting

Conceptually we want to write out words in Morse code, reverse the sequence of dots and dashes, and turn the result back into English text. But we can do this without actually working with Morse code.

We can reverse the letters in the input, then replace each letter with the letter corresponding to reversing its Morse code.

I don’t know of an easy way to reverse a string in a shell script, but I do know how to do it with a Perl one-liner.

    perl -lne 'print scalar reverse'

Next we need to turn around the dots and dashes of individual letters. Most letters stay the same, but there are six pairs of letters to swap:

• (A, N)
• (B, V)
• (D, U)
• (F, L)
• (G, W)
• (Q, Y)

The tr (“translate”) utility was made for this kind of task, replacing all characters in one string with their counterparts in another.

    tr ABDFGQNVULWY NVULWYABDFGQ

Note that tr effectively does all the translations at the same time. For example, it replaces A’s with N’s and N’s with A’s simultaneously. If it simply marched down the two strings, replacing A’s with N’s, then replacing B’s to V’s, etc., it would not do what we want. For example, AN would first become NN and then AA.

Putting these together, the following one-liner proves that “footstool” is a palindrome in Morse code

    echo FOOTSTOOL | perl -lne 'print scalar reverse' |
tr ABDFGQNVULWY NVULWYABDFGQ


because the output is “FOOTSTOOL”.

Perl has a tr function very much like the shell utility, so we could do more of the work in Perl:

    echo FOOTSTOOL |
perl -lne "tr /ABDFGQNVULWY/NVULWYABDFGQ/; print scalar reverse"

Update: A comment from Alastair below let me know you can replace the bit of Perl in the first one-liner with a call to tac.

    echo FOOTSTOOL | tac -rs . | tr ABDFGQNVULWY NVULWYABDFGQ

By default tac lists the lines of a file in reverse order. The name comes from reversing “cat”, the name of the command that dumps a file (“concatenates” it to standard output). The extra arguments to tac cause it to change the definition of a line separator to any character, as indicated by the regular expression consisting of a single period. This effectively tells tac to treat every character as a line, so reversing the lines reverses the string.

Using cryptography broken 50 years ago

Old cryptography never dies. After a method is broken, its use declines, but never goes to zero.

And when I say “broken,” I do not mean no longer recommended, but broken to the point of being trivial to decrypt. I recently ran across an anecdote from World War I showing this is nothing new. The Vigenère cipher had been broken decades before the war broke out but was widely used anyway.

In this post I’ll explain what the Vigenère cipher is, give a little history of its rise and fall, and explain how it can be broken.

Vigenère cipher

A simple substitution cipher replaces one letter with another. For example, maybe you replace A with X, B with J, C with B, etc. Simple substitution ciphers are so easy to break that they’re included in pulp puzzle books.

The Vigenère cipher is a step up from simple substitution. It combines a key of length n with the clear text in such a way that you effectively have n different simple substitution ciphers. It’s not hard to break—I’ll explain how to break it below—but it’s less vulnerable that simple substitution. It makes it harder to spot high-frequency letters like E because not all E’s are encrypted the same way.

In its simplest form Vigenère essentially adds a key and a message mod 26. For example, if your message is “Attack the bridge at dawn” and your key is “ossifrage” the encryption would work like this.


clear:  ATTACKTHEBRIDGEATDAWN
key:    OSSIFRAGEOSSIFRAGEOSS
cipher: OLLIHBTNIPJALLVAZHOOF


Starting from the left, O is the 14th letter of the alphabet (counting from 0), and so A is moved ahead 14 places to 0. Since the clear text and the key are involved symmetrically, you could say that the letter O is moved head zero places since A is the 0th letter of the alphabet.

The next two letters of the clear text and the key happen to be repeated [1]. T and S are the 19th and 18th letters of the alphabet, and 19 + 18 = 37, which is congruent to 11 mod 16. The 11th letter of the alphabet is L.

Here’s a little Python code to carry out the encryption.

    clear = "ATTACKTHEBRIDGEATDAWN"
key   = "OSSIFRAGE"

A = ord('A')
for i in range(len(clear)):
c = ord(clear[i]) - A
k = ord(key[i % len(key)]) - A
e = (c + k) % 26
print(chr(e + A), end="")
print()


A more secure version of Vigenère moves the clear text through scrambled alphabets. The simplified version above is a particularly insecure special case. However, using scrambled alphabets (“polyalphabetic substitution”) doesn’t make the method that much stronger.

World War I

According to [2],

[The Vigenère cipher] was commonly regarded as unbreakable and was widely used up through World War I, even though the Prussian cryptographer Friedrich Wilhelm Kasiski had published a method for breaking it in 1863.

David Kahn [3] gives more detail about Kasiski’s book:

Die Geheimschriften und die Dechiffrir-kunst concentrates on answering the problem that had vexed cryptanalysts for more than 300 years: how to achieve a general solution for polyalphabetic ciphers with repeating keywords. … But the 95-page volume seems to have stirred almost no comment at the time.

So armies were depending on the security of Vigenère over 50 years after it had been broken. This was worse than using DES today, around 50 years after it came out. Weaknesses have been found in DES, and it’s 56-bit keys are too short to resist a brute force attack from contemporary computers. But DES has not been broken as thoroughly as Vigenère had been broken by the time of WWI.

Breaking Vigenère

So how would you go about attacking Vigenère? At first it seems hard to break. You can’t naively look at letter frequencies. In the example above, the clear text has four A’s, but they are encrypted three different ways.

However, Vigenère with a key of length n is just a set of n different simple substitution ciphers. You can chop the cipher text into n pieces and break each one separately.

How would you know the key length n? You could just try brute force. Try 1, then 2, then 3, etc. For example, to see whether the example above might have a key of length 2, you could analyze the letters in even positions and the letters in odd positions separately. If n isn’t too large (and in practice it often wasn’t large) brute force is efficient.

A more sophisticated approach would be to try different alignments and see which one results in statistical properties most consistent with English text. (Or French text if you suspect your clear text was in French etc.) This was the motivation for William Friedman developing his index of coincidence.

In hindsight this seems fairly obvious, but it was not obvious to anyone for three centuries before Kasiski, nor to many for decades after.

Why study obsolete cryptography?

Classical encryption methods are completely obsolete. The methods described in David Kahn’s book The Codebreakers can now be broken almost instantly. So why study old cryptography?

My interest in the subject was renewed recently because I had use for it in new projects. Not directly, though I have seen obsolete encryption in use, but indirectly. Some of the ideas from classical cryptography are relevant to searching for patterns in data, even though nothing was encrypted.

There are still lessons to learn from classical cryptography. For example, even a subtle lack of randomness in keys can be exploited. This was done during WWII, and flaws in random number generators are still causing security failures today.

Related posts

[1] This is a little foreshadowing of what can go wrong even with non-repeating keys: If the clear text and the key are English prose, coincidences like this will happen, and they are an exploitable weakness.

[2] James S. Craft and Lawrence C. Washington. An Introduction to Number Theory with Cryptography, 2nd edition. CRC Press.

[3] David Kahn. The Codebreakers: The Story of Secret Writing. Scribner, 1967.

Alt tags on tweet images

I learned this morning via a comment that Twitter supports alt text descriptions for images. I didn’t think that it did, and said that it didn’t, but someone kindly corrected me.

When I post equations as images on this site, I always include the LaTeX source code in an alt tag. That way someone using a screen reader can determine the content of the equation. It also helps me if I need to go back and change an equation. I’d like to do the same on Twitter.

Unfortunately, it seems support for this feature is inconsistent. Maybe it’s new. The software I use to manage my Twitter accounts apparently doesn’t offer a way to add alt text.

When I use Twitter via its web site, I am able to write alt text but not able to read it. Maybe you have to have accessibility features turned on, which would be unfortunate. People who do not use screen readers occasionally benefit from being able to read photo descriptions. I could imagine, for example, that someone might be curious to see the LaTeX code I used to create an equation image.

I tried a couple experiments, one on my personal account and one on my AlgebraFact account. On the latter, I posted an image of the quadratic formula with the text description

    x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}

When I look at the tweet while logged into my AlgebraFact account, I see a little black box in the lower left corner of the image with “ALT” in white letters.

I do not see the same box on an image I posted from my personal account. When I log into my personal account, I no longer see the ALT box on the AlgebraFact tweet but I see one on an image I posted.

Question and request

It appears that with default settings, users cannot see image descriptions. What do you have to do to see the descriptions?

I intend to always put LaTeX source in the alt tag of equation images on this site. If you run across an equation without alt text, please let me know.

This weekend I had to enter an alphabetic passcode on a numeric keypad. The keypad used the same letter-to-digit convention as a phone, but the letters were not printed on the keypad. That made me think about how much better the Major system is.

I wondered what phone keypads would look like if they used the Major memory system, and so I made the image below.

The Major memory system is a way of encoding numbers as words to make them easier to memorize. The system associates a consonant sound with each digit; you’re free to insert any vowels you like. For example, if you wanted to memorize 745, you might encode it as gorilla.

Note that gorilla decodes to 745 and not 7455 because the word has only one L sound, even though it is spelled with two Ls.

One nice feature of the Major system is that if you multiply a number by 10, you can pluralize its mnemonic. For example, a possible mnemonic for 7450 is gorillas.

The Major system emphasizes sounds because humans remember sounds more easily than symbols. If you have a photographic memory for symbols, just memorize the digits and don’t bother with any mnemonic system.

Some of the sounds associated with digits are not represented by a single letter in English and so the keypad above contains a few IPA (International Phonetic Alphabet) symbols. The number 1 is encoded by any of the sounds “t”, “d”, or “th.” The IPA symbol θ represents the th sound in think and the ð represents the th sound in this. The symbol ŋ as a possible encoding for 2 represents the ng sound in sing. And the number 6 can be encoded as one of several similar sounds: ch, sh, soft g, or soft z.

The conventional phone keypad looks simpler: 2 = A, B, or C, 3 = D, E, or F, etc. It’s the kind of thing James Scott would call “legible,” something that looks simple on paper and warms a bureaucrat’s heart, but doesn’t necessarily work well in practice. The sounds associated with the letters for a given digit have nothing in common, so a number can be represented by dissimilar sounds, and similar sounds do not represent the same number.

Encoding telephone numbers as words is rarely possible using the conventional keypad letters. Phone numbers that do correspond to memorable words are highly valued. Every letter has to correspond to a digit, and it matters how the word is spelled.

The Major system is much more flexible since you’re free to supply vowels as you wish, and you can choose from a wide variety of words that spell a single consonant sound in different ways.

Removing Unicode formatting

Several people responded to my previous post asserting that screen readers would not be able to read text formatted via Unicode variants. Maybe some screen readers can’t handle this, but there’s no reason they couldn’t.

Before I go any further, I’d like to repeat my disclaimer from the previous post:

It’s a dirty hack, and I’d recommend not overdoing it. But it could come in handy occasionally. On the other hand, some people may not see what you intend them to see.

This formatting is gimmicky and there are reasons to only use it sparingly or not at all. But I don’t see why screen readers need to be stumped by it.

In the example below, I format the text “The quick brown fox” by running it through unifont as in the previous post.

If we pipe the output through unidecode then we mostly recover the original text. (I wrote about unidecode here.)

    \$ unifont The quick brown fox | unidecode

Double-Struck: The quick brown fox
Monospace: The quick brown fox
Sans-Serif: The quick brown fox
Sans-Serif Italic: The quick brown fox
Sans-Serif Bold: The quick brown fox
Sans-Serif Bold Italic: The quick brown fox
Script: T he quick brown fox
Italic: The quick brown fox
Bold: The quick brown fox
Bold Italic: The quick brown fox
Fraktur: T he quick brown fox
Bold Fraktur: T he quick brown fox


The only problem is that sometimes there’s an extra space after capital letters. I don’t know whether this is a quirk of unifont or unidecode.

This isn’t perfect, but it’s a quick proof of concept that suggests this shouldn’t be a hard thing for a screen reader to do.

Maybe you don’t want to normalize Unicode characters this way all the time, but you could have some configuration option to only do this for Twitter, or to only do it for characters outside a certain character range.

How to format text in Twitter

Twitter does not directly provide support for formatting text in bold, italic, etc. But it does support Unicode characters [1], and so a hack to get around the formatting limitation is to replace letters with Unicode variants.

For example, you could tweet

How to include bold or italic text in a tweet.

I cheated in the line above, using bold and italic formatting rather than Unicode characters because some readers might not be able to read it.

Here’s a screenshot of the actual Unicode text in Emacs. You can see the text in the footnotes [2].

This is plain text. I have asked for the details on the ‘b’ in bold, and the bottom windows shows that it is not the common U+0062 for ‘b’ down in the ASCII range, but U+1D5EF up in the Supplementary Multilingual Plane. Similarly, the i in italic above is not U+0069 but U+1D456.

Here’s how the text appears in Twitter:

It’s a dirty hack, and I’d recommend not overdoing it. But it could come in handy occasionally. On the other hand, some people may not see what you intend them to see. Here’s a portion of a screenshot from an Android device:

As a very rough rule of thumb, characters with smaller Unicode values are more likely to display correctly everywhere. Math symbols like ∞ (U+221E) work everywhere as far as I know. I wouldn’t depend on any Unicode character above 0xFFFF.

Update: Several people have said this formatting poses a problem for speech readers. The next post explains why it shouldn’t. (Maybe it does cause a problem, but it wouldn’t have to.)

How to produce Unicode formatting

I produced the Unicode text above using the programs unifont and unisupers from the Perl module Unicode::Tussle. See this post for how to install the module. Here’s a screenshot of using these utilities from the command line.

To use unifont, type the text you’d like to format after the command. It then shows the text formatted several ways using Unicode characters. I typed “bold” and copied the bold version of the word. The text could be anything; it’s a coincidence that I gave it text that was also a format name. For example, I created the double-struck R and C above with the command

    unifont R C

The unisupers command does not take an argument but instead takes its input from standard input. So I hit return after the command name and then typed ‘n’ to get the superscript n.

Related posts

[1] Twitter supports Unicode characters, but there’s a question of whether readers will have fonts installed to display the characters. I wrote eight years ago about some symbols users were and were not likely to see, but my impression is that the situation has improved quite a bit since then.

[2] Here’s the actual text of the tweet:

How to include  or  text in a tweet.
Weierstrass function.
Im: ℂ -> ℝ
ℝⁿ -> ℝᵐ


(I pasted the text into my blogging software, but it looks like it is deleting the words “bold” and “italic.”)

My densest books

I recently got a copy of Methods of Theoretical Physics by Morse and Feshbach. It’s a dense book, literally and metaphorically. I wondered whether it might be the densest book I own, so I weighed some of my weightier books.

Morse and Feshbach has density 1.005 g/cm³, denser than water.

Gravitation by Misner, Thorne, and Wheeler is, appropriately, a massive book. It’s my weightiest paperback book, literally and perhaps metaphorically. But it’s not that dense, about 0.66 g/cm³. It would easily float.

The Mathematica Book by Wolfram (4th edition) is about the same weight as Gravitation, but denser, about 0.80 g/cm³. Still, it would float.

Physically Based Rendering by Pharr and Humphreys weighs in at 1.05 g/cm³. Like Morse and Feshbach, it would sink.

But the densest of my books is An Atlas of Functions by Oldham, Myland, and Spanier, coming in at 1.12 g/cm³.

The books that are denser than water were all printed on glossy paper. Apparently matte paper floats and glossy paper sinks.