Uncategorized

Chebyshev and Russian transliteration

It’s not simple to transliterate Russian names to English. Sometimes there is a unique mapping, or at least a standard mapping, of a particular name, but often there is not.

An example that comes up frequently in mathematics is Pafnuty Lvovich Chebyshev (1821–1894). This Russian mathematician’s name Пафну́тий Льво́вич Чебышёв has been transliterated at Tchebichef, Tchebychev, Tchebycheff, Tschebyschev, Tschebyschef, Tschebyscheff, Čebyčev, Čebyšev, Chebysheff, Chebychov, Chebyshov, etc.

The American Mathematical Society has settled on “Chebyshev” as its standard, and this is now common in English mathematical writing. But things named after Chebyshev, such as Chebyshev polynomials, are often denoted with a T because the French prefer “Tchebyshev.”

There is an ISO standard, ISO 9, for transliterating Cyrillic characters into Latin characters. Under this standard, Чебышёв becomes Čebyšëv. This maps Cyrillic into Latin characters with diacritical marks but not into ASCII. The AMS realized that the vast majority of Americans would not type Čebyšëv into a search bar, for example, and chose Chebyshev instead.

Related posts

Podcast feed

The previous post was an AI-generated podcast that I friend made by crawling my web site. I decided to create an actual podcast for posting occasional audio files. I expect to post very sporadically. I’ve posted two audio files, and I have one more in mind to post some day. Maybe that’ll be the end of it, or maybe I’ll post more.

The first file I posted was the one from the previous post. The second was an interview I did with the late Sir Michael Atiyah.

Here’s the RSS feed for the podcast. You can also find the podcast via my Substack newsletter.

Unicode Steganography

Steganography attempts to prevent messages from being read by unintended recipients by hiding the messages rather than (or in addition to) encrypting them. Steganography is used when you not only want to keep your communication private, you want to hide the fact that you’ve communicated at all.

Fun fact: The words steganography and stegosaurus are related [1].

Famous example

A famous example of steganography was a secret message sent by Jeremiah Denton during the Vietnam War. While a prisoner of war, Denton was forced to participate in a Vietnamese propaganda video. He send the word torture by blinking the Morse code for the letters in the word. You can find the video here.

Clip from Jeremiah Denton propaganda video with Morse code blinking

Famous non-example

Some alleged examples of steganography have turned out to be apophenia, looking for patterns where they do not exist. The book The Woman Who Smashed Codes details Elizebeth Smith’s introduction to cryptography, being tasked to find messages hidden in minor variations in Shakespeare’s handwriting that were not there. The book goes on to describe her cryptographic work during WWII, deciphering messages that most certainly did exist.

Incidentally, Elizebeth Smith [2] married fellow cryptographer William F. Friedman. I wrote about Friedman’s index of coincidence a while back.

Enter Unicode

Randall Monroe said “I am endlessly delighted by the hopeless task that the Unicode Consortium has created for themselves.” One of the things that makes their task delightful and hopeless is trying to distinguish semantics from appearance.

For example, the capital letters  at the beginning of the Roman and Greek alphabets have different Unicode values even though they both look like alike. A (U+0041) is a Roman letter and Α (U+0391) is a Greek letter and so they’re not the same. Also, the Roman letter M (U+004D) is semantically different from the Roman numeral Ⅿ (U+216F) that represents 1,000.

But it quickly becomes impossible to consistently make such distinctions, and so Unicode is full of compromises. Should the letter i and the imaginary unit i have different code points? What about the symbol i for current and the unit basis vector i? You can’t have a different code point for every use of a symbol.

Because Unicode has numerous pairs of characters with identical appearance, it’s possible to hide binary data in Unicode text by using one member of a pair to represent a 0 and the other to represent a 1. So maybe d (U+0064 Latin Small Letter D) represents a 0 and ԁ (U+0501 Cyrillic Small Letter Komi De) represents a 1.

There is a potential problem with this scheme. Unicode does not dictate appearance, and it’s entirely possible a typographer might create a font that has distinct glyphs for characters that are not distinct in other fonts.

Security

Look-alike characters are often used to create malicious URLs. For instance, someone might take “Microsoft.com” and substitute the Roman numeral Ⅿ for the first letter, or substitute a Greek omicron for one of the o‘s.

Text that is expected to ASCII should be turned into ASCII to prevent mistakes or malice, or the user warned. “Do you really want to visit this URL that contains nine Roman letters and one Cyrillic letter?”

When I’m reading, I want fonts with broad Unicode support. No missing symbols, no jarring change in font for foreign words. But when I’m debugging, it would be nice to have the opposite, a xenophobic font that displays non-ASCII characters in some ugly way that makes them jump out. I imagine someone has developed such a font, but it’s hard to find because most people are looking for better Unicode support, not worse.

Related posts

[1] Both derive from the Greek word for ‘cover’. Steganographic writing is covered in the sense of being hidden. A stegosaurus has armored plates that look like roof tiles, i.e. like the covering of a house.

[2] That’s not a typo. She spelled her name with ‘e’ as the fifth letter rather than the more common ‘a’.

Cycle of New Year’s Days

Here’s a visualization of how the day of the week for New Year’s Day changes.

The green diamonds represent leap years and the blue squares represent ordinary years.

The day of the week for New Year’s Day advances one day after each ordinary year and two days after each leap year, hence the diagonal stripes in the graph above.

The whole cycle repeats every 28 years. During that 28 year cycle, New Year’s Day falls on each day of the week four times: three times in an ordinary year and once in a leap year. Or to put it another way, each horizontal row of the graph above contains three blue squares and one green diamond.

The comments above are true under the Julian calendar, without exception. And they’re true for long stretches of time under the Gregorian calendar. For example, the pattern above repeats from 1901 to 2099.

The Julian calendar had a leap day every four years, period. This made the calendar year longer than the solar year by about 3 days every 400 years, so the Gregorian calendar removed 3 leap days. A year divisible by 100 is not a leap year unless it is also divisible by 400. So the Gregorian calendar disrupts the pattern above every 100 years.

Related posts

Most popular posts of 2024

I looked at Hacker News to see which posts on this site were most popular. I didn’t look at my server logs, but generally the posts that get the most traffic are posts that someone submits to Hacker News.

Older posts popular this year

Two posts written earlier got a lot of traffic this year, namely

Writes large correct programs

from 2008 and

Where has all the productivity gone?

from 2021.

Posts written this year

The most popular post this year, at least on Hacker News, was

Why does FM sound better than AM?

The runner up was

Evaluating a class of infinite sums in closed form

The following post looks at a way for a satellite to move from one orbit to another that under some circumstances is more efficient (in terms of fuel, not in terms of time) than the more common Hohmann transfer maneuver.

Efficiently transferring to a much higher orbit

This post considers interpolation as a form of compression. Instead of saving a table of function values at fine-grained intervals, you could store values at points further apart and store interpolation formulas for recovering the lost precision.

Compression and interpolation

One of the arguments between Frequentist and Bayesian statisticians is whether you should be allowed to look at data as it accrues during an experiment, such as in A/B testing. If you do look at the interim data, how should you analyze it and how should you interpret the results?

Can you look at experimental results along the way or not?

Finally, I wrote a post about solving a problem I ran into with the command line utility find. As is often the case, I got a lot of useful feedback.

Resolving a mysterious problem with find

 

Putting a face on a faceless account

I’ve been playing around with Grok today, logging into some of my X accounts and trying out the prompt “Draw an image of me based on my posts.” [1] In most cases Grok returned a graphic, but sometimes it would respond with a text description. In the latter case asking for a photorealistic image made it produce a graphic.

Here’s what I get for @AlgebraFact:

The icons for all my accounts are cerulean blue dots with a symbol in the middle. Usually Grok picks up on the color, as above. With @AnalysisFact, it dropped a big blue piece of a circle on the image.

For @UnixToolTip it kept the & from the &> in the icon. Generative AI typically does weird things with text in images, but it picked up “awk” correctly.

Here’s @ProbFact. Grok seems to think it’s a baseball statistics account.

Last but not least, here’s @DataSciFact.

I wrote a popular post about how to put Santa hats on top of symbols in LaTeX, and that post must have had an outsided influence on the image Grok created.

[1] Apparently if you’re logging into account A and ask it to draw B, the image will be heavily influence by A‘s posts, not B‘s. You have to log into B and ask in the first person.

Perpetual Calendars

The previous post explained why the Gregorian calendar is the way it is, and that it consists of a whole number of weeks. It follows that the Gregorian calendar repeats itself every 400 years. For example, the calendar for 2025 will be exactly the same as the calendar for 1625 and 2425.

There are only 14 possible printed calendars, if you don’t print the year on the calendar. There are seven possibilities for the day of the week for New Year’s Day, and there are two possibilities for whether the year is a leap year.

A perpetual calendar is a set of the 14 possible calendars, along with some index that tells which possible calendar is appropriate in a given year.

Are each of the 14 calendars equally frequent? Almost, aside from the fact that leap years are less frequent. Each ordinary year calendar occurs 43 or 44 times, and each leap year calendar occurs 13, 14, or 15 times.

Related posts

Coiled logarithmic graph

A logarithmic scale is very useful when you need to plot data over an extremely wide range. However, sometimes even a logarithmic scale may not reduce the visual range enough.

I recently saw a timeline-like graph that was coiled into a spiral, packing more information into a limited visual window [1].

I got to thinking about when this could be useful. This raises two questions.

  1. When might you want to visualize something that grows faster than exponentially?
  2. How would this compare to the radial growth of a spiral as a function of arc length?

Let’s address the second question first. What exactly do we mean by spiral? Archemedian spirals are a class of spirals that include what many people think of as a spiral. These spirals have polar equation

r = b θ1/n

where n is a constant. The choice n = 1 corresponds to spirals with evenly spaced arms, such as a roll of carpet.

When n = 1, the length of the spiral for θ running from 0 to T is T on the order of T² when T is large, as shown in this post. For general n, the length is on the order of T1 + 1/n.

If n = 1 and the distance of a spiral from the origin grows linearly as a function of θ, the arc length is growing quadratically. If the logarithm is growing quadratically, the function itself must be something like exp(kθ²).

So now back to the first question. What is something that grows super exponentially that we might want to plot? The first thing that came to my mind was factorials. Stirling’s approximation shows that the logarithm of factorial grows faster than linearly but slower than any power with exponent larger than 1.

log(x!) = x log xx + O(log x).

and so if we plot x! on a coiled logarithmic scale, the distance from the image of a point to the origin will grow less than linearly, even if we allow the spiral parameter n to be larger than 1. But for a limited range of x, a coiled logarithmic plot works well. Here’s a polar plot of log(x!)/10.

I don’t know of a natural example of something that grows like exp(kθ²). Of course you could always construct something to have this growth, but I can’t think of a common algorithm, for example, whose run time is the exponential of a quadratic. If you know a good example, please a comment.

[1] I can trace the provenance of the image to here, but that page doesn’t say where the image came from.

Dogecoin anthem

Rocket with Dogecoin mascot

Someone sent me an AI-generated Dogecoin anthem: To Da Moon.

Here’s the audio.

 

And here are the lyrics:

Yo, it started as a joke, now we in the game,
Dogecoin rocket, yeah, remember the name.
Crypto vibes, makin’ history soon,
Strapped to the rocket, we’re goin’ to the moon.

Elon on the tweets, got the memes in check,
Shiba Inu power, cashin’ every check.
From the hodlers to the dreamers, we makin’ it right,
Dogecoin anthem, light up the night.

(Chorus)
Doge to da moon, ya, Doge to da moon
To da moooooooooon
Diamond hands brotha
All aboard the rocket, Doge to da moon!

(Verse 2)
Started at a penny, now it’s hittin’ big heights,
Laughin’ to the bank while we flexin’ the lights.
They said it was a phase, but we changin’ the game,
Shoutout to the legends who believed in the name.

Blockchain movin’, decentralize the power,
Crypto revolution, it’s our finest hour.
From the traders to the hodlers, shout it real loud,
Doge Army strong, and we bringin’ the crowd.

(Chorus)
Doge to da moon, ya, Doge to da moon
To da moooooooooon
Diamond hands brotha
All aboard the rocket, Doge to da moon!
All aboard the rocket, Doge to the moon.

[instrumental]

[vocoder]

The Department of Government Efficiency
is coming for your bureaucracy,
No more waste don’t ya see?

(Bridge)
It’s not just a coin, it’s a vibe, it’s a culture,
Crypto rebel spirit, yeah, we ridin’ like vultures.
From the charts to the memes, we hittin’ the tune,
Dogecoin anthem, it’s our lunar commune.

(Chorus)
Doge to da moon, ya, Doge to da moon
To da moooooooooon
Diamond hands brotha
All aboard the rocket, Doge to da moon!

(Outro)
Moonshot dreams, yeah, the future is bright,
Crypto revolution, takin’ flight tonight.
Dogecoin forever, yeah, we’ll never be through,
42 42 42 42 42 42 forty-twooooooo

Blogging pace

When I started this blog, almost 17 years ago, I posted nearly every day. The first time I went a couple days without posting I got a message from someone asking if everything was OK.

I’ve slowed down since then, and even more lately. Last week I was busy with professional work, and this week I’ve been busy with personal work.

I’ve never had a schedule for this blog: I write when I feel like writing, which usually means several times a week. I’ve averaged 3 posts every 4 days since I started writing here. Presumably I’ll pick back up next week. We’ll see.

If you’d like to be notified when I write something here, you have several options. You could subscribe via RSS to hear of every post as soon as it is published. If you’d rather get a notification of a few posts at a time, along with a little introduction to each, you could subscribe to my newsletter.

You can also follow me on social media, primarily X but also Mastodon and Bluesky.