Uncategorized

The Chicken McNugget Monoid

When McDonalds first introduced Chicken McNuggets, you could buy McNuggets in boxes of 6, 9, or 20. The Chicken McNugget problem is to determine which numbers of McNuggets you can and cannot buy. A number n is a McNugget number if it is possible to buy exactly that many McNuggets (using the original boxes).

Box of 6 Chicken McNuggets

There are only finitely many non-McNugget numbers, and these are listed in OEIS sequence A065003.

Note that if you order eight boxes of 6 nuggets you get 48 nuggets, and so if you order more than eight boxes, in any combination, you get more than 48 nuggets. So the following program will compute all McNugget numbers less than 48.

    ns = set() 
    for i in range(8):
        for j in range(8):
            for k in range(8):
                n = 6*i + 9*j + 20*k
                if n <= 48:
                    ns.add(n)
    
    comp = set(range(48)) - ns

The non-McNugget numbers less than 48 are stored in comp, the set complement of ns. These numbers are

1, 2, 3, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19, 22, 23, 25, 28, 31, 34, 37, 43.

It turns out 48 and 49 are also McNugget numbers. You can verify this by changing “48” to “50” in the code above and looking at ns. This means that 44, 45, 46, 47, 48, and 49, a run of 6 consecutive numbers are all McNugget numbers, and so you can add boxes of 6 to these numbers to get any greater number. This shows that 43 is the largest non-McNugget number.

The set of McNugget numbers is contains 0, my favorite number of McNuggets, and is closed under addition, and so it forms a monoid.

Update: The next post generalizes this one, giving a little theory and more general code.

Source: Factoring in the Chicken McNugget monoid. arXiv:1709.01606

Which one is subharmonic?

The Laplace operator Δ of a function of n variables is defined by

\Delta f = \sum_{i=1}^n \frac{\partial^2 f}{\partial x^2_i}

If Δ f = 0 in some region Ω, f is said to be harmonic on Ω.  In that case f takes on its maximum and minimum over Ω at locations on the boundary ∂Ω of Ω. Here is an example of a harmonic function over a square which clearly takes on its maximum on two sides of the boundary and its minimum on the other two sides.

Plot of x^2 - y^2 over [-1,1] cross [-1,1].

The theorem above can be split into two theorems and generalized:

If Δ f ≥ 0, then f takes on its maximum on ∂Ω.

If Δ f ≤ 0, then f takes on its minimum on ∂Ω.

These two theorems are called the maximum principle and minimum principle respectively.

Now just as functions with Δf equal to zero are called harmonic, functions with Δf non-negative or non-positive are called subharmonic and superharmonic. Or is it the other way around?

If Δ f ≥ 0 in Ω, then f is called subharmonic in Ω. And if Δ f ≤ 0 then f is called superharmonic. Equivalently, f is superharmonic if –f is subharmonic.

The names subharmonic and superharmonic may seem backward: the theorem with the greater than sign is for subharmonic functions, and the theorem with the less than sign is for superharmonic functions. Shouldn’t the sub-thing be less than something and the super-thing greater?

Indeed they are, but you have to look f and not Δf. That’s the key.

If a function f is subharmonic on Ω, then f is below the harmonic function interpolating f from ∂Ω into the interior of Ω. That is, if g satisfies Laplace’s equation

\begin{align*} \Delta g &= 0 \mbox{ on }\Omega \\ g &= f \mbox{ on } \partial\Omega \end{align*}

then fg on Ω.

For example, let f(x) = ||x|| and let Ω be the unit ball in ℝn. Then Δ f ≥ 0 and so f is subharmonic. (The norm function f has a singularity at the origin, but this example can be made rigorous.) Now f is constantly 1 on the boundary of the ball, and the constant function 1 is the unique solution of Laplace’s equation on the unit ball with such boundary condition, and clearly f is less than 1 on the interior of the ball.

Related posts

Ideograph numerals

This post is a follow on to my previous post on Unicode numbers. I always welcome feedback from readers, but I especially welcome it here because I’m walking into an area I know next to nothing about.

Consecutive code points

Unicode generally assigns code points to number-like things in consecutive order. For example, the Python code

    for n in range(1,10):
        print(chr(0x30+n), chr(0x24f4+n), chr(0x215f+n))

prints

    1 ⓵ Ⅰ
    2 ⓶ Ⅱ
    3 ⓷ Ⅲ
    4 ⓸ Ⅳ
    5 ⓹ Ⅴ
    6 ⓺ Ⅵ
    7 ⓻ Ⅶ
    8 ⓼ Ⅷ
    9 ⓽ Ⅸ

showing that ASCII digits, circled numerals, and Roman numerals are encoded consecutively.

Parenthesized and circled ideographs

So the same is probably true for ideographs representing digits, right?

一 二 三 四 五 六 七 八 九 ㈠ ㈡ ㈢ ㈣ ㈤ ㈥ ㈦ ㈧ ㈨ ㊀ ㊁ ㊂ ㊃ ㊄ ㊅ ㊆ ㊇ ㊈

No, but before we get into that, the following code shows that parenthesized ideographs and circled ideographs for digits are numbered consecutively. The code

    from unicodedata import numeric, name

    for i in range(1, 10):
        cp = 0x321f + i
        ch = chr(cp)
        print(ch, hex(cp), numeric(ch), name(ch))
    
    for i in range(1, 10):
        cp = 0x327f + i
        ch = chr(cp)
        print(ch, hex(cp), numeric(ch), name(ch))    

prints

    ㈠ 0x3220 1.0 PARENTHESIZED IDEOGRAPH ONE
    ㈡ 0x3221 2.0 PARENTHESIZED IDEOGRAPH TWO
    ㈢ 0x3222 3.0 PARENTHESIZED IDEOGRAPH THREE
    ㈣ 0x3223 4.0 PARENTHESIZED IDEOGRAPH FOUR
    ㈤ 0x3224 5.0 PARENTHESIZED IDEOGRAPH FIVE
    ㈥ 0x3225 6.0 PARENTHESIZED IDEOGRAPH SIX
    ㈦ 0x3226 7.0 PARENTHESIZED IDEOGRAPH SEVEN
    ㈧ 0x3227 8.0 PARENTHESIZED IDEOGRAPH EIGHT
    ㈨ 0x3228 9.0 PARENTHESIZED IDEOGRAPH NINE
    ㊀ 0x3280 1.0 CIRCLED IDEOGRAPH ONE
    ㊁ 0x3281 2.0 CIRCLED IDEOGRAPH TWO
    ㊂ 0x3282 3.0 CIRCLED IDEOGRAPH THREE
    ㊃ 0x3283 4.0 CIRCLED IDEOGRAPH FOUR
    ㊄ 0x3284 5.0 CIRCLED IDEOGRAPH FIVE
    ㊅ 0x3285 6.0 CIRCLED IDEOGRAPH SIX
    ㊆ 0x3286 7.0 CIRCLED IDEOGRAPH SEVEN
    ㊇ 0x3287 8.0 CIRCLED IDEOGRAPH EIGHT
    ㊈ 0x3288 9.0 CIRCLED IDEOGRAPH NINE

CJK Unified Ideographs

Now let’s take the parentheses and circles off.

The following code shows that the CJK unified ideographs for digits are not digits (!) according to Unicode, but they are numeric. It also shows that their code points are not assigned in any apparent order.

    numerals = "一二三四五六七八九十"
    for n in numerals:
        print(n, hex(ord(n)), n.isdigit(), numeric(n))

This outputs the following.

    一 0x4e00 False 1.0
    二 0x4e8c False 2.0
    三 0x4e09 False 3.0
    四 0x56db False 4.0
    五 0x4e94 False 5.0
    六 0x516d False 6.0
    七 0x4e03 False 7.0
    八 0x516b False 8.0
    九 0x4e5d False 9.0
    十 0x5341 False 10.0

I assume the ordering of ideographs in Unicode has its own internal logic (with exceptions and historical quirks) that I know nothing about. If anyone knows of any patterns of how code points are assigned to ideographs, please let me know.

The names of the characters above say nothing about what the characters mean. For example, the official Unicode name for 九 (U+4E5D) is CJK UNIFIED IDEOGRAPH-4E5D. The name says nothing about the ideograph representing the digit 9, though the numeric property of the digit is indeed 9. My guess is that when that character represents a digit, it represents 9, but maybe it can mean other things in other contexts.

Unicode numbers

There are 10 digits in ASCII, and I bet you can guess what they are. In ASCII, a digit is a decimal is a number.

Things are much wilder in Unicode. There are hundreds of decimals, digits, and numeric characters, and they’re different sets.

 ꩓ ٦ ³ ⓶ ₅ ⅕ Ⅷ ㊈

The following Python code loops through all possible Unicode characters, extracting the set of decimals, digits, and numbers.

    numbers  = set()
    decimals = set() 
    digits   = set()

    for i in range(1, 0x110000):
        ch = chr(i)
        if ch.isdigit():
            digits.add(ch)
        if ch.isdecimal():
            decimals.add(ch)
        if ch.isnumeric():
            numbers.add(ch)

These sets are larger than you may expect. The code

    print(len(decimals), len(digits), len(numbers))

tells us that the size of the three sets are 650, 778, and 1862 respectively.

The following code verifies that decimals are a proper subset of digits and that digits are a proper subset of numerical characters.

    assert(decimals < digits < numbers)

Now let’s look at the characters in the image above. The following code describes what each character is and how it is classified. The first three characters are digits, the next three are decimals but not digits, and the last three are numeric but not decimals.

    from unicodedata import name
    for c in "꩓٦":
        print(name(c))
        assert(c.isdecimal())
    for c in "³⓶₅":
        print(name(c))    
        assert(c.isdigit() and not c.isdecimal())
    for c in "⅕Ⅷ㊈":
        print(name(c))    
        assert(c.isnumeric() and not c.isdigit())

The names of the characters are

  1. MATHEMATICAL DOUBLE-STRUCK DIGIT EIGHT
  2. CHAM DIGIT THREE
  3. ARABIC-INDIC DIGIT SIX
  4. SUPERSCRIPT THREE
  5. DOUBLE CIRCLED DIGIT TWO
  6. SUBSCRIPT FIVE
  7. VULGAR FRACTION ONE FIFTH
  8. ROMAN NUMERAL EIGHT
  9. CIRCLED IDEOGRAPH NINE

Update: See the next post on ideographic numerals.

Update: There are 142 distinct numbers that correspond to the numerical value associated with a Unicode character. This page gives a list of the values and an example of each value.

Related posts

Visualizing Swedish vowels

A few days ago I wrote a post comparing English and Japanese vowel sounds in a 2D chart. In this post I’d like to do something similar for English and Swedish. As before the data come from [1].

A friend of mine who learned Swedish would joke about how terribly he had to contort his mouth to speak the language. Swedish vowels are objectively difficult for non-native speakers as can be seek in a vowel chart. The vertical axis runs from closed sounds on top to open sounds on the bottom. The horizontal axis runs from front vowels on the left to back vowels on the right.

Swedish vowel sounds

There are a lot of vowel sounds, and many of them are clustered close together. Japanese, by contrast, has only five vowel sounds, and they’re widely spread apart.

Japanese vowel sounds

The vowel charts for Spanish and Hebrew look fairly similar to the chart for Japanese above: five vowels spread out in roughly the same locations.

It wouldn’t matter so much that Swedish has a lot of tightly clustered vowel sounds if your native language has the same sounds, but the following chart shows that English and Swedish vowels are quite different. The red x’s mark English vowel locations and the blue dots mark Swedish vowels.

Swedish and English vowel sounds

[1] Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet. Cambridge University Press, 2021.

Making flags in Unicode

I recently found out [1] that the Unicode sequences for flag emoji are created by taking the two-letter country abbreviation (ISO 3166-1 alpha-2) and replacing both letters with their counterparts in the range U+1F1E6 through U+1F1FF.

For example, the abbreviation for Canada is CA, and the characters 🇨 (U+1F1e8) and 🇦 (U+1F!E6) together create 🇨🇦.

boxed C plus boxed A = Canadian flag

This is illustrated by the following Python code.

    import iso3166

    def flag_emoji(name):
        alpha = iso3166.countries.get(name).alpha2
        box = lambda ch: chr( ord(ch) + 0x1f1a5 )
        return box(alpha[0]) + box(alpha[1])
    print(flag_emoji("Canada"))

The name we give to flag_emoji need not be the full country name, like Canada. It can be anything that iso3166.countries.get supports, which also includes two-letter abbreviations like CA, three-letter abbreviations like CAN, or ISO 3166 numeric codes like 124.

We can use the following code to print a collage of flags:

    def print_all_flags():
        for i, c in enumerate( iso3166.countries ):
            print(flag_emoji(c.name), end="")
            if i%25 == 24: print()

10 by 25 array of flags

Related posts

[1] I learned this from watching Dylan Beattie’s talk Plain Text on YouTube.

Visualizing English and Japanese vowels

Vowel sounds can be visualized in a two-dimensional space according to tongue position. The vertical axis is runs from open down to closed, and the horizontal runs from front to back. See a linguistics textbook for far more detail.

English has five vowel letters, but a lot more than five vowel sounds. Scholars argue about how many vowel sounds English and other languages have because there’s room for disagreement on how much two sounds can differ and still be considered variations on the same sound. The IPA Handbook [1] lists 11 vowel sounds in American English, not counting diphthongs.

When I wrote about Japanese hiragana and katakana recently, I showed how the letters are arranged into a grid with one side labeled with English vowel letters. Is that justified? Does Japanese really have just five vowel sounds, and are they similar to five English vowels? Essentially yes. This post will show how English and Japanese vowel sounds compare according to [1].

Here are versions of the vowel charts for the two languages that I made using Python’s matplotlib.

First English:

Then Japanese:

And now the two combined on one plot:

Four out of the five Japanese vowels have a near equivalent in English. The exception is the Japanese vowel with IPA symbol ‘a’, which is midway between the English vowels with symbols æ (U+0230) and ɑ (U+0251), somewhere between the a in had and the a in father.

Update: See the comments for a acoustic phonetician’s input regarding frequency analysis.

Update: Here is a similar post for Swedish. Swedish is interesting because it has a lot of vowel sounds, and the sounds are in tight clusters.

Analogy with KL divergence

The differences between English and Japanese vowels are asymmetric: an English speaker will find it easier to learn Japanese vowels than a Japanese speaker would find it to learn English vowels. This reminiscent of the Kullback-Leibler divergence in probability and statistics.

KL-divergence is a divergence and not a distance, even though it is often called a distance, because it’s not symmetric. The KL-divergence between two random variables X and Y, written KL(X || Y), is the average surprise in seeing Y when you expected X. If you expect English vowel sounds and hear Japanese vowel sounds you’re not as surprised as if you expect Japanese vowel sounds and hear English. The English student of Japanese hears familiar sounds shifted a bit, but the Japanese student of English hears new sounds.

Related posts

[1] Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet. Cambridge University Press, 2021.

Katakana, Hiragana, and Unicode

I figured out something that I wasn’t able to find by searching, so I’m posting it here in case other people have the same question and the same difficulty finding an answer.

I’m sure other people have written about this, but I couldn’t find it. Maybe lots of people have written about this in Japanese but not many in English.

Japanese kana consists of two syllabaries, hiragana and katakana, that are like phonetic alphabets. Each has 46 basic characters, and each corresponds to a block of 96 Unicode characters. I had two simple questions:

  1. How do the 46 characters map into the 90 characters?
  2. Do they map the same way for both hiragana and katakana?

Hiragana / katakana correspondence

I’ll start with the second question because it’s easier. Hiragana and katakana are different ways of representing the same sounds, and they correspond one to one. For example, the full name of U+3047 () is

HIRAGANA LETTER SMALL E

and the full name of its katakana counterpart U+30A7 () is

KATAKANA LETTER SMALL E

The only difference as far as Unicode goes is that katakana has three code points whose hiragana counterpart is unused, but these are not part of the basic letters.

The following Python code shows that the names of all the characters are the same except for the name of the system.

    from unicodedata import name

    unused = [0, 151, 152] # not in hiragana

    for i in range(0,63):
        if i in unused:
            continue
        h = name(chr(0x3040 + i)) 
        k = name(chr(0x30a0 + i))
        assert(h == k.replace("KATAKANA", "HIRAGANA"))
    print("done")

Mapping 46 into 50 and 96

You’ll see kana written in grid with one side labeled with 5 vowels and the other labeled with 10 consonants called a gojūon (五十音). That’s 50 cells, and in fact gojūon literally means 50 sounds, so how do we get 46? Five cells are empty, and one letter doesn’t fit into the grid. The empty cells are unused or archaic, and the extra character doesn’t fit the grid structure.

In the image below, the table on the left is for hiragana and the table on the right is for katakana. HTML versions of the tables available here.

Left out of each table is in hiragana and in katakana.

So does each set of 46 characters map into its Unicode code block?

Unicode numbers the letters consecutively if you traverse the grid increasing vowels first, then consonants, and adding the straggler at the end. But the reason 46 letters expand into more code points is that each letter can have one, two, or three variations. And there are various miscellaneous other symbols in the Unicode block.

For example, there is a LETTER E as well as the SMALL LETTER E mentioned above. Other variations seem to correspond to voiced and unvoiced versions of a consonant with a phonetic marker added to the voiced version. For example, く is U+304F, HIRAGANA LETTER KU, and ぐ is U+3050, HIRAGANA LETTER GU.

Here is how hiragana maps into Unicode. Each cell should be U+3000 plus the characters show.

         a  i  u  e  o 
        42 44 46 48 4A 
     k  4B 4D 4F 51 53 
     s  55 57 59 5B 5D 
     t  5F 61 64 66 68 
     n  6A 6B 6C 6D 6E 
     h  6F 72 75 78 7B 
     m  7E 7F 80 81 82 
     y  84    86    88 
     r  89 8A 8B 8C 8D 
     w  8F          92 

The corresponding table for katakana is the previous table plus 0x60:

         a  i  u  e  o 
        A2 A4 A6 A8 AA 
     k  AB AD AF B1 B3 
     s  B5 B7 B9 BB BD 
     t  BF C1 C4 C6 C8 
     n  CA CB CC CD CE 
     h  CF D2 D5 D8 DB 
     m  DE DF E0 E1 E2 
     y  E4    E6    E8 
     r  E9 EA EB EC ED 
     w  EF          F2 

In each case, the letter missing from the table is the next consecutive value after the last in the table, i.e. is U+30F3.

Related posts

Dominoes in Unicode

I was spelunking around in Unicode and found that there are assigned characters for representing domino tiles and that the characters are enumerated in a convenient order. Here is the code chart.

There are codes for representing tiles horizontally or vertically. And even though, for example, the 5-3 is the same domino as the 3-5, there are separate characters for representing the orientation of the tile: one for 3 on the left and one for 5 on the left.

When you include orientation like this, a domino becomes essentially a base 7 number: the number of spots on one end is the number of 7s and the number of spots on the other end is the number of 1s. And the order of the characters corresponds to the order as base 7 numbers:

0-0, 0-1, 0-2, …, 1-0, 1-1, 1-2, … 6-6.

The horizontal dominoes start with the double blank at U+1F031 and the vertical dominoes start with U+1F063, a difference of 32 in hexadecimal or 50 in base 10. So you can rotate a domino tile by adding or subtracting 50 to its code point.

The following tiny Python function gives the codepoint for the domino with a spots on the left (or top) and b spots on the right (or bottom).

    def code(a, b, wide):
        cp = 0x1f031 if wide else 0x1f063
        return cp + 7*a + b

We can use this function to print a (3, 5) tile horizontally and a (6, 4) tile vertically.

    print( chr(code(3, 5, True )),
           chr(code(6, 4, False)) )

To my surprise, my computer had the fonts installed to display the results. This isn’t guaranteed for such high Unicode values.

horizontal 3-5 domino and vertical 6-4