Cross platform muscle memory

I’ve mostly used Windows and Linux for the last several years. When I needed to get a new laptop recently I got a MacBook Pro. I’ve used a Mac before, but it’s been a while, and so I’m starting over.

I can move between Windows and Linux and almost forget what OS I’m using because keyboard shortcuts work the same way. But MacOS is substantially different. I’m trying to find a way to remap the modifier keys on my Mac so that my muscle memory still works on the new laptop.

There are two difficulties: where the keys are located, and what the keys do. I’ll explain why I distinguish these.

I use Emacs, which means I use the Control key (⌃) a lot. So the first thing I did was swap Caps Lock and Control. Now when I use the-key-formerly-known-as-caps-lock it behaves as I expect, on Linux, Windows, and Mac. Similarly, I swapped the Option (⌥) and Command (⌘) keys so that the-key-next-to-the-spacebar acts the same on all three operating systems.

But when I’m not using Emacs, there’s a problem. The Control key is where I expect it, but the Control key doesn’t do what I expect. I expect, for example, Control-C to copy and Control-V to paste. But on Mac, Command-C copies and Command-C pastes. In general, the Command key on a Mac does what the Control key does on Windows and Linux.

I’m trying to customize my OS configurations and habits in order to bring my use of three operating systems closer together. Here’s what I’ve come up with so far, and I welcome your suggestions.

Now the key in the lower left corner of the keyboard acts the same across operating systems. Holding down that key and pressing C copies etc. On my Mac, this key is labeled with a globe icon, and on my other machines it’s labeled Ctrl. But I type by feel, not by sight, and when I reach for this key it does what I expect. However, I do have to keep in mind that the functionality of the Command key on MacOS is not exactly the same as the functionality of the Control key on Windows and Linux.

Now on Linux I have two left Control keys, one in the bottom left corner as discussed above, and one labeled “Caps Lock.” I could use either one anywhere, but I’m getting in the habit of using the former Caps Lock as the control key when using Emacs and using the key in the lower left as the control key everywhere else. That way my keyboard mostly does what I expect across platforms.


There’s a simple pattern I didn’t realize until after I posted this artice:

The Control key on a Mac works like the Control key in Emacs, and the Command key on a Mac works like the Control key on Windows.

For example, Control-B moves backward, as in Emacs, and Command-B makes text bold, like Control-B on Windows.

The settings described above work well with this pattern. Mac has one Unix-like control key (⌃) and one Windows-like control key (⌘).

More on near-integer decibels

In base 10, four decibel values are approximately integers. Yesterday I explored whether base 10 was unique in this regard. I defined the value of n decibels in base b to be

g(n, b) = bn/b.

Sticking in 10 for b gives the usual definition of decibel levels.

There are two ways to quantify what proportion of (generalized) decibel values are near integers:

  1. Pick a tolerance value ε and count what proportion of values lie within ε of an integer.
  2. Measure the average distance of decibel values to integers.

The first approach is more natural, but the second approach is more convenient. Yesterday’s post focused on the latter. Today I’ll return to the former.

The problem with the first approach is that it depends on your choice of ε. Here’s a graph that shows what we get when we count the proportion of near integer decibel values for bases 2 through 1000 using three different values of ε.

Naturally the smaller your value of ε, the stricter your notion of “near,” the smaller proportion of near integers you’ll find.

Notice that for large values of b the proportions jitter around the value 2ε. For large bases, the fractional parts of decibel levels behave like uniform random numbers between 0 and 1, and so the distance to the nearest integer behaves like a uniform random variable between 0 and 0.5. The probability such a value is less than ε equals  ε/0.5 = 2ε.

For small values of b things are more interesting. The left side of the graph above shows a lot of overlap between the three plots. We see mostly green just because I plotted the green line last. Let’s zoom in on the bases up to 20.

I first made this plot using discrete markers rather than connected lines. Generally that’s a good thing to do for functions only defined on integers. But the plot was hard to read. Connecting the lines makes it easier to see which values correspond to the same value of ε.

For each of the values of ε above, base 10 has the highest proportion of near integer decibel values. Which base has the second highest proportion depends on the value of ε.

If you pick ε small enough, i.e. smaller than the distance of any base 10 decibel value to an integer, then base 4 has the highest proportion of near integer decibels because one of its values is exactly and integer: in base 4, 2 dB equals 2. In the notation above, g(2, 4) = 2.

Dump a pickle file to a readable text file

I got a data file from a client recently in “pickle” format. I happen to know that pickle is a binary format for serializing Python objects, but trying to open a pickle file could be a puzzle if you didn’t know this.

Be careful

There are a couple problems with using pickle files for data transfer. First of all, it’s a security risk because an attacker could create a malformed pickle file that would cause your system to run arbitrary code. In the Python Cookbook, the authors David Beazley and Brian K. Jones warn

It’s essential that pickle only be used internally with interpreters that have some ability to authenticate one another.

The second problem is that the format could change. Again quoting the Cookbook,

Because of its Python-specific nature and attachment to source code, you probably shouldn’t use pickle as a format for long-term storage. For example, if the source code changes, all of your stored data might break and become unreadable.

Suppose someone gives you a pickle file and you’re willing to take your chances and open it. It’s from a trusted source, and it was created recently enough that the format probably hasn’t changed. How do you open it?


The following code will open the file data.pickle and read it into an object obj.

    import pickle
    obj = pickle.load(open("data.pickle", "rb"))

If the object in the pickle file is very  small, you could simply print obj. But if the object is at all large, you probably want to save it to a file rather than dumping it at the command line, and you also want to “pretty” print it than simply printing it.

Pretty printing

The following code will dump a nicely-formatted version of our pickled object to a text file out.txt.

    import pickle
    import pprint

    obj = pickle.load(open("sample_data.pickle", "rb"))

    with open("out.txt", "a") as f:
         pprint.pprint(obj, stream=f)

In my case, the client’s file contained a dictionary of lists of dictionaries. It printed as one incomprehensible line, but it pretty printed as 40,000 readable lines.

Prettier printing

Simon Brunning left a comment suggesting that the json module output is even easier to read.

    import json

    with open("out.txt", "a") as f:
        json.dump(obj, f, indent=2)

And he’s right, at least in my case. The indentation json.dump produces is more what I’d expect, more like what you’d see if you were writing the structure in well-formatted source code.

Related posts


It’s a curious and convenient fact that many decibel values are close to integers [1]:

  • 3 dB ≈ 2
  • 6 dB ≈ 4
  • 7 dB ≈ 5
  • 9 dB ≈ 8

Is base 10 unique in this regard? If we were to look at the analogs of decibels in other bases, would we see a higher or lower proportion of near integers? For example, we could look at the base 12 (duodecimal) analog of decibels, or “duodecibels” for short. Or we could look at the base 16 analog “hexadecibels.”

A decibel is a multiplicative factor of 101/10 and so n decibels is a factor of 10n/10. We will denote the analog of n decibels in base b by

g(n, b) = bn/b.

Let’s compare how many values of g(n, b) are near integers for b = 10 and 12. Here’s base 10:

And here’s base 12:

In the plot for base 10, four of the dots straddle a horizontal line, but in the plot for base 12 only two dots do. So by one standard, we could say decibels have four near integer values but duodecibels have only 2.

This comparison is informative, but it’s arbitrary because it depends on our plotting parameters. It would be better to count how many values come within ε of an integer, though that would still be somewhat arbitrary because it would depend on a choice of ε. A more objective measure would be to compute the average distance of each value of g(n, b) to the nearest integer [2].

Here’s a plot of the mean distance from generalized decibel values to the nearest integer as a function of the base b.

As the base gets larger, the dots seem to vary around 0.25. This is what we’d expect if the fractional parts were randomly distributed.

Let’s zoom in on bases 2 through 16:

The smallest value corresponds to base 4. The second smallest value corresponds to base 10, though this is not easy to see in the plot because the values for bases 3 and 7 are only slightly higher.

So is base 10 special? It is in this sense: it has the smallest mean distance from generalized decibel values to integers for any integer base b with b > 4.

The statement above assumes that the average distance continues to hover around 0.25 for large bases, which appears to be the case. For bases 100 through 10,000 the average distances range between 0.2275 and 0.2646, well above the value of 0.1726 for base 10.

Incidentally, if we go back to counting near integers and choose ε between 0.02 and 0.17, then base 10 has the largest proportion of near integer decibel values.


I only looked at mean distances to integers for bases up to 10,000 because I was working in Python and going further than that would be slow. I rewrote by program in C [3] and looked at bases up to 1,000,000. The two curves below give lower and upper bounds on the variation in mean distances. At each point b, the upper curve is the maximum from b to the 1,000,000 and the lower curve is the corresponding minimum. (This might remind you of lim sup and lim inf.) Note that the scale on the horizontal axis is logarithmic.

Related posts

[1] It’s also convenient that the values that are not close to integers are close fractions involving small powers of 2 and 5. More on that here.

[2] This is borrowing an idea from compressive sensing and other areas, using the L1 norm as a proxy for sparsity. It’s awkward to count how many entries in a vector are “near zero” but easier to look at the L1 norm.

[3] There are ways to make Python code run faster, but my personal rule is to not spend much time optimizing Python. I’ll try a few things, but then if that doesn’t work I rewrite things in C. Especially if it’s a small amount of code, rewriting in C is easier and more reliable, at least for me. YMMV.

Keeping data and code together with org-mode

With org-mode you can keep data, code, and documentation in one file.

Suppose you have an org-mode file containing the following table.

    #+NAME: mydata
    | Drug | Patients |
    |    X |      232 |
    |    Y |      351 |
    |    Z |      117 |

Note that there cannot be a blank line between the NAME header and the beginning of the table.

You can bring this table into Python simply by declaring it to be a variable in the header of a Python code block.

    #+begin_src python :var tbl=mydata :results output

When you evaluate this block, you see that the table is imported as a list of lists.

    [['X', 232], ['Y', 351], ['Z', 117]]

Note that the column headings were not imported into Python. Now suppose you would like to retain the headers, and use them as column names in a pandas data frame.

    #+begin_src python :var tbl=mydata :colnames no :results output
    import pandas as pd
    df = pd.DataFrame(tbl[1:], columns=tbl[0])
    print(df, "\n")

When evaluated, this block produces the following.

      Drug  Patients 
    0    X       232
    1    Y       351
    2    Z       117


Note that in order to import the column names, we told org-mode that there are no column names! We did this with the header option

    :colnames no

This seems backward, but it makes sense. It says do bring in the first row of the table, even though it appears to be a column header that isn’t imported by default. But then we tell pandas that we want to make a data frame out of all but the first row (i.e. tbl[1:]) and we want to use the first row (i.e. tbl[0]) as the column names.

A possible disadvantage to keeping data and code together is that the data could be large. But since org files are naturally in outline mode, you could collapse the part of the outline containing the data so that you don’t have to look at it unless you need to.

Related posts

YYZ and Morse code

The song YYZ by Rush opens with a theme based on the rhythm of “YYZ” in Morse code:

    -.--  -.--  --..

YYZ is the designation for the Toronto Pearson International Airport, the main airport serving Toronto. The idea for the song came from hearing the airport identifier in Morse code.

However, the song puts no spaces between rhythm corresponding to each letter. Here’s what the opening riff would look like in sheet music:

Each dash is a middle C and each dot is an F# a tritone below middle C.

When I listen to the song, I don’t hear YYZ. My mind splits up the rhythm with each sequence of long notes starting a group:

    -.  ---.  ----..

So I hear the 20/8 time signature as (3 + 7 + 10)/8.

In terms of Morse code, -. is N. Interpreting the other groupings depends on what you mean by Morse code. The American amateur radio community defines Morse code as 40 characters: the 26 letters of the Latin alphabet, 10 digits, and 4 more symbols: / = , . Using that definition of Morse code, there are no symbols corresponding to ---. or ----... There is no symbol corresponding to ---- either. More on unused sequences here.

However, sometimes ---. is used for Ö and ---- for Š. So the way I hear “YYX” would be more like “NÖŠI”.

There are many other ways to parse -.---.----.. into Morse code symbols. For example, NO1I

    -.  ---  .----  ..


How many ways could you split -.---.----.. into valid Morse code?

Here’s an outline of a recursive algorithm to enumerate the possibilities.

Start at the beginning and list the possible symbols formed by consecutive dots and dashes. In our case the possible symbols are T, N, K, and Y. So the possibilities are

  • T (-) added to the front of all sequences that start with .---.----..
  • N (-.) added to the front of all sequences that start with ---.----..
  • K (-.-) added to the front of all sequences that start with --.----..
  • Y (-.--) added to the front of all sequences that start with -.----..

So for the first bullet point, for example, how would we find all sequences that start with .---.----..? Use the same idea.

  • E (.) added to the front of all sequences that start with ---.----..
  • A (.-) added to the front of all sequences that start with --.----..
  • W (.--) added to the front of all sequences that start with -.----..
  • J (.---) added to the front of all sequences that start with .----..

So pull off all the symbols you can from the beginning of the list of dots and dashes and in each case recurse on the rest of the list.

Related posts

Twitter follower distribution

A conversation this morning prompted the question of how many Twitter accounts have between 10,000 and 20,000 followers. I hadn’t thought about the distribution numbers of followers in a while and was curious to revisit the topic.

Apparently this question was more popular five years ago. When I did a few searches on the topic, all the results I got back were five or more years old. Also, data about the Twitter network was more available then that it is now.

This post will come up with an estimate based on almost no data, approaching this like a Fermi problem.


I’m going to assume that the number of followers for the nth most followed account is

f = c n

for some constants c and α. A lot of networks approximately follow some distribution like this, and often α is somewhere between 1 and 3.

Two data points

I’ve got two unknowns, so I need two data points. Wikipedia has a list of the 50 most followed Twitter accounts here, and the 50th account has 37.7 million followers. (I chose the 50th account on the list rather than a higher one because power laws typically fit better after the first few data points.)

I believe that a few years ago the median number of Twitter followers was 0: more than half of Twitter accounts had no followers. Let’s assume the median number of followers is 1. But median out of what? I think I read there are about 350 million total Twitter accounts, and about 200 million active accounts. So should we base our median out of 350 accounts or 200 accounts? We’ll split the difference and assume the median account is the 137,500,000th most popular account.

Solve for parameters

So now we have two equations:

c 50 = 37,700,000

c 137500000 = 1

and taking logs gives us two linear equations in two unknowns. We solve this to find α = 1.1 and c = 2.9 × 109. The estimate of α is about the size we expected, so that’s a good sign.

Take this with a grain of salt. We’ve guessed a very simple model and fit it with just two data points, one of which we based on an educated guess.

Final estimate

We assumed f = c n and we can solve this for n. If an account has n followers, we’d estimate its rank n as

n = (c / f)1/α.

So we’d estimate the number of accounts with between 10,000 and 20,000 followers to be

(c / 10000)1/α – (c /20000)1/α.

which is about 40,000. I expect this final estimate is not bad, say within an order of magnitude, despite all the crude assumptions made along the way.

Mahler’s inequality

I ran across a reference to Mahler the other day, not the composer Gustav Mahler but the mathematician Kurt Mahler, and looked into his work a little. A number of things have been named after Kurt Mahler, including Mahler’s inequality.

Mahler’s inequality says the geometric mean of a sum bounds the sum of the geometric means. In detail, the geometric mean of a list of n non-negative real numbers is the nth root of their product. If x and y are two vectors of length n containing non-negative components, Mahler’s inequality says

G(x + y) ≥ G(x) + G(y)

where G is the geometric mean. The left side is strictly larger than the right unless x and y are proportional, or x and y both have a zero component in the same position.

I’m curious why this inequality is named after Mahler. The classic book Inequalities by Hardy, Littlewood, and Polya list the inequality but call it Hölder’s inequality. In a footnote they note that the inequality above appears in a paper by Minkowski in 1896 (seven years before Kurt Mahler was born). Presumably the authors file the inequality under Hölder’s name because it follows easily from Hölder’s inequality.

I imagine Mahler made good use of his eponymous inequality, i.e. that the inequality became associated with him because he applied it well rather than because he discovered it.

More geometric mean posts

Code katas taken more literally

Karate class

Code katas are programming exercises intended to develop programming skills, analogous to the way katas develop martial art skills.

But literal katas are choreographed. They are rituals rather than problem-solving exercises. There may be an element of problem solving, such as figuring how to better execute the prescribed movements, but katas are rehearsal rather than improvisation. brings up the analogy to musical practice in the opening paragraph of the home page. But musical practice is also more ritual than problem-solving, at least for classical music. A musician might go through major and minor scales in all 12 keys, then maybe a chromatic scale over the range of the instrument, then two different whole-tone scales, etc.

A code kata would be more like a jazz musician improvising a different melody to the same chord changes every day. (Richie Cole would show off by improvising over the chord changes to Cherokee in all twelve keys. I don’t know whether this was a ritual for him or something he would pull out for performances.)

This brings up a couple questions. What would a more literal analog of katas look like for programming? Would these be useful?

I could imagine someone going through a prescribed sequence of keystrokes that exercise a set of software features that they wanted to keep top of mind, sorta like practicing penmanship by writing out a pangram.

This is admittedly a kind of an odd idea. It makes sense that the kinds of exercises programmers are interested in require problem solving rather than recall. But maybe it would appeal to some people.


Image “karate training” by Genista is licensed under CC BY-SA 2.0 .

Seconds to hours

Suppose you have a number of seconds n and you want to convert it to hours and seconds. If you divide n by 3600, the quotient is the number of hours and the remainder is the number of seconds.

For example, suppose n = 8072022. Here’s the obvious way to do the calculation in Python:

    def div3600(n): return (n // 3600, n % 3600)

This returns (242, 822) if we pass in 872022, and so 872022 seconds equals 242 hours and 822 seconds.

However, as is often the case, the most obvious approach is not the fastest. In general, division is much slower than multiplication and addition, though division by powers of 2 is very fast. So we could speed up our calculation if we had a way to change the problem to one that requires dividing by powers of 2 rather than dividing by 3600.

Here’s another Python program that computes the same result a very different way.

    def div3600(n):
        assert(0 <= n < 2255761) 
        t = 1193047*n 
        q = t >> 32
        r = 3600*(t & 4294967295) >> 32
        assert(q == n // 3600)
        assert(r == n % 3600)
        return (q, r)

This algorithm does not work for all n, hence the assert statement at the top of the function verifying that n is in range. Notice that there are no explicit division statements. There is a bit shift, which amounts to a division by 232, and there is a bitwise and, which amounts to taking the remainder after dividing by 232.

The Python code is a prototype. If we cared so much about performance that we would consider using this algorithm, we shouldn’t be using Python. Here is a C implementation that would be more realistic.

    void div3600(long n, long* q, long* r) {
        long t = 1193047*n;
        *q = t >> 32;
        *r = 3600*(t & 4294967295) >> 32;

The algorithm presented above is a example from [1]. In that paper the authors study integer functions of the form

r + β) // δ


r + β) % δ

and their application to speeding up software [2], especially calendar algorithms. The authors include benchmarks in several programming languages that verify that their algorithms are indeed significantly faster than the direct approach.

Of course most applications would do well to use direct calculations which are obviously correct. But a large scale application that does these kinds of calculations over and over might benefit from an obfuscated but more efficient algorithm.


[1] Cassio Neri and Lorenz Schneider. Euclidean Affine Functions and Applications to Calendar Algorithms. Arxiv 2102.06959v1.

[2] Here I’ve used Python’s integer division notation //. The author’s use C notation, where / means quotient because the context is integer arguments. I prefer Python’s notation because it is more explicit.