# Cross platform muscle memory

I’ve mostly used Windows and Linux for the last several years. When I needed to get a new laptop recently I got a MacBook Pro. I’ve used a Mac before, but it’s been a while, and so I’m starting over.

I can move between Windows and Linux and almost forget what OS I’m using because keyboard shortcuts work the same way. But MacOS is substantially different. I’m trying to find a way to remap the modifier keys on my Mac so that my muscle memory still works on the new laptop.

There are two difficulties: where the keys are located, and what the keys do. I’ll explain why I distinguish these.

I use Emacs, which means I use the Control key (⌃) a lot. So the first thing I did was swap Caps Lock and Control. Now when I use the-key-formerly-known-as-caps-lock it behaves as I expect, on Linux, Windows, and Mac. Similarly, I swapped the Option (⌥) and Command (⌘) keys so that the-key-next-to-the-spacebar acts the same on all three operating systems.

But when I’m not using Emacs, there’s a problem. The Control key is where I expect it, but the Control key doesn’t do what I expect. I expect, for example, Control-C to copy and Control-V to paste. But on Mac, Command-C copies and Command-C pastes. In general, the Command key on a Mac does what the Control key does on Windows and Linux.

I’m trying to customize my OS configurations and habits in order to bring my use of three operating systems closer together. Here’s what I’ve come up with so far, and I welcome your suggestions.

Now the key in the lower left corner of the keyboard acts the same across operating systems. Holding down that key and pressing C copies etc. On my Mac, this key is labeled with a globe icon, and on my other machines it’s labeled Ctrl. But I type by feel, not by sight, and when I reach for this key it does what I expect. However, I do have to keep in mind that the functionality of the Command key on MacOS is not exactly the same as the functionality of the Control key on Windows and Linux.

Now on Linux I have two left Control keys, one in the bottom left corner as discussed above, and one labeled “Caps Lock.” I could use either one anywhere, but I’m getting in the habit of using the former Caps Lock as the control key when using Emacs and using the key in the lower left as the control key everywhere else. That way my keyboard mostly does what I expect across platforms.

## Update

There’s a simple pattern I didn’t realize until after I posted this article:

The Control key on a Mac works like the Control key in Emacs, and the Command key on a Mac works like the Control key on Windows.

For example, Control-B moves backward, as in Emacs, and Command-B makes text bold, like Control-B on Windows.

The settings described above work well with this pattern. Mac has one Unix-like control key (⌃) and one Windows-like control key (⌘).

# Keeping data and code together with org-mode

With org-mode you can keep data, code, and documentation in one file.

Suppose you have an org-mode file containing the following table.

    #+NAME: mydata
| Drug | Patients |
|------+----------|
|    X |      232 |
|    Y |      351 |
|    Z |      117 |


Note that there cannot be a blank line between the NAME header and the beginning of the table.

You can bring this table into Python simply by declaring it to be a variable in the header of a Python code block.

    #+begin_src python :var tbl=mydata :results output
print(tbl)
#+end_src


When you evaluate this block, you see that the table is imported as a list of lists.

    [['X', 232], ['Y', 351], ['Z', 117]]

Note that the column headings were not imported into Python. Now suppose you would like to retain the headers, and use them as column names in a pandas data frame.

    #+begin_src python :var tbl=mydata :colnames no :results output
import pandas as pd
df = pd.DataFrame(tbl[1:], columns=tbl[0])
print(df, "\n")
print(df["Patients"].mean())
#+end_src


When evaluated, this block produces the following.

      Drug  Patients
0    X       232
1    Y       351
2    Z       117

233.33333333333334


Note that in order to import the column names, we told org-mode that there are no column names! We did this with the header option

    :colnames no

This seems backward, but it makes sense. It says do bring in the first row of the table, even though it appears to be a column header that isn’t imported by default. But then we tell pandas that we want to make a data frame out of all but the first row (i.e. tbl[1:]) and we want to use the first row (i.e. tbl[0]) as the column names.

A possible disadvantage to keeping data and code together is that the data could be large. But since org files are naturally in outline mode, you could collapse the part of the outline containing the data so that you don’t have to look at it unless you need to.

# Memorizing Planck’s constant with DALL-E

Planck’s constant used to be a measured quantity and now it is exact by definition.

h = 6.62607015×10−34 J / Hz

Rather than the kilogram being implicit in the units used to measure Planck’s constant, the mass of a kilogram is now defined to be whatever it has to be to make Planck’s constant have the value above.

Now that it’s exact by definition, maybe you’d like to memorize it. Using the Major system described here we could encode the digits as “Judge enjoys quesadilla.” [1]

As with the previous post, I’m using a memorization exercise as an excuse to play around with DALL-E. I typed “A judge enjoying eating a quesadilla” into DALL-E 2 and got back four images, as always. The best of these was the following.

The food in the image looks like a quesadilla, but it’s not clear that the man eating it is a judge, or that he’s enjoying himself.

Next I changed “judge” to “a supreme court justice,” hoping DALL-E would create an image that more obviously features a judge.

Here’s one of the outputs:

This fellow looks more like a judge, and he’s obviously enjoying himself. Maybe he’s eating a calzone, but we’ll call it a quesadilla.

Not all the images created by DALL-E are as accurate as the ones above. I suspect there’s a lot of selection bias in the examples of images posted online. I’m contributing to that selection bias by showing images that were good enough to include in a blog post. I tried other images for blogging on other topics, and the results were not worth sharing.

So in an attempt at mitigating selection bias, here’s another image generated from the prompt “A supreme court justice enjoying eating a quesadilla.”

This young lady is wearing black, as supreme court justices are wont to do. And she appears to be enjoying herself, but she’s definitely not eating a quesadilla.

Incidentally, another possible encoding of 662607015 is “Judge enjoys Costello” as in Abbot and Costello. When I typed “A supreme court justice enjoying watching Abbot and Costello on television” I got the following creepy image.

## Related posts

[1] This mnemonic is a little bit of a cheat, depending on how you pronounce quesadilla. The sound of ll is sorta like that of a y in English. Here I’m using it to represent 5 just as the l sound does. Here in southeast Texas, I believe most people use the Spanish pronunciation, at least approximately. If you completely anglicize the pronunciation so that ll is pronounced as in pillow, then you can use the mnemonic with no qualms.

# Naming probability functions

Given a random variable X, you often want to compute the probability that X will take on a value less than x or greater than x. Define the functions

FX(x) = Prob(Xx)

and

GX(x) = Prob(X > x)

What do you call F and G? I tend to call them the CDF (cumulative distribution function) and CCDF (complementary cumulative distribution function) but conventions vary.

The names of software functions to compute these two functions can be confusing. For example, Python (SciPy) uses the names cdf and sf (the latter for “survival function”) while the R functions to compute the CDF take an optional argument to return the CCDF instead [1].

In the Emacs calculator, the function ltpn computes the CDF. At first glace I thought this was horribly cryptic. It’s actually a very nice naming convention; it just wasn’t what I was expecting.

The “ltp” stands for lower tail probability and “n” stands for normal. The complementary probability function is utpn where “utp” stands for upper tail probability. Unlike other software libraries, Emacs gives symmetric names to these two symmetrically related functions.

“Lower tail” probability is clearer than “cumulative” probability because it leaves no doubt whether you’re accumulating from the left or the right.

You can replace the “n” at the end of ltpn and utpn with the first letters of binomial, chi-square, t, F, and Poisson to get the corresponding functions for these distributions. For example, utpt gives the upper tail probability for the Student t distribution [2].

The Emacs calculator can be quirky, but props to the developers for choosing good names for the probability functions.

## Related posts

[1] Incidentally, the CCDF cannot always be computed by simply computing the CDF first and subtracting the value from 1. In theory this is possible, but not in floating point practice. See the discussion of erf and erfc in this post for an explanation.

[2] These names are very short and only a handful of distribution families are supported. But that’s fine in context. The reason to use the Emacs calculator is to do a quick calculation without having to leave Emacs, not to develop production quality statistical software.

# Floating point inverses and stability

Let f be a monotone, strictly convex function on a real interval I and let g be its inverse. For example, we could have f(x) = ex and g(x) = log x.

Now suppose we round our results to N digits. That is, instead of working with f and g we actually work with fN and gN where

fN(x) = round(f(x), N)

and

gN(x) = round(g(x), N)

and round(y, N) is the number y rounded to N significant figures [1].

This is what happens when we implement our functions f and g in floating point arithmetic. We don’t actually get the values of f and g but the values of fN and gN.

We assumed that f and g are inverses, but in general fN and gN will not be exact inverses. And yet in some sense the functions fN and gN are like inverses. Harold Diamond [2] proved that if go back and forth applying fN and gN two times, after two round trips the values quit changing.

To make this more precise, define

hN(x) = gN( fN(x)).

In general, hN(x) does not equal x, but we do have the following:

hN( hN( hN(x) ) )  = hNhN(x) ).

The value we get out of hN(x) might not equal x, but after we’ve applied hN twice, the value stays the same if we apply hN more times.

## Connection to connections

Diamond’s stability theorem looks a lot like a theorem about Galois connections. My first reaction was that Diamond’s theorem was simply a special case of a more general theorem about Galois connections, but it cannot.

A pair of monotone functions F and G form a Galois connection if for all a in the domain of F and for all b in the domain of G,

F(a) ≤ baG(b).

Let F and G form a Galois connection and define

H(x) = G( F(x) ).

Then

H( H(x) ) = H(x).

This result is analogous to Diamond’s result, and stronger. It says we get stability after just one round trip rather than two.

The hitch is that although the functions f and g form a Galois connection, the functions fN and gN do not. Nevertheless, Diamond proved that fN and gN form some sort of weaker numerical analog of a Galois connection.

## Example

The following example comes from [2]. Note that the example rounds to two significant figures, not two decimal places.

    from decimal import getcontext, Decimal

# round to two significant figures
getcontext().prec = 2
def round(x): return float( Decimal(x) + 0 )

def f(x):  return 115 - 35/(x - 97)
def f2(x): return round(f(x))
def g(x):  return 97 + 35/(115 - x)
def g2(x): return round(g(x))
def h2(x): return g2(f2(x))

N = 110
print(h2(N), h2(h2(N)), h2(h2(h2(N))))


This prints

   100.0 99.0 99.0

showing that It shows that the function h2 satisfies Diamond’s theorem, but it does not satisfy the identify above for Galois compositions. That is, we stabilize after two round trips but not after just one round trip.

## Related posts

[1] Our “digits” here need not be base 10 digits. The stability theorem applies in any radix b provided bN ≥ 3.

[2] Harold G. Diamond. Stability of Rounded Off Inverses Under Iteration. Mathematics of Computation, Volume 32, Number 141. January 1978, pp. 227–232.

# Inline computed content in org-mode

The previous post discussed how to use org-mode as a notebook. You can have blocks of code and blocks of results, analogous to cells in a Jupyter notebook. The code and the results export as obvious blocks when you export the org file to another format, such as LaTeX or HTML. And that’s fine for a notebook.

Now suppose you want to do something more subtle. You want to splice in the result of a computed value without being obvious about it. Maybe you want to compute a value rather than directly enter it so that the document will remain consistent. Maybe you have a template and you want to set the parameters of the template at the top of the file.

Web development languages like PHP do this well. You can write a PHP file that is essentially an HTML file with pieces of code spliced in. You do this my inserting

    <?php … ?>

into the HTML code, and when the page is rendered the code between the <?php and ?> tags is replaced with the result of executing the code. We’d like to do something analogous in org-mode with org-babel. (org-babel is the subsystem of org-mode that interacts with code.)

Here’s an org-mode example that sets length and width as variables at the top of a file and multiplies them later in the body of the file to get area.

We define our variables as follows. The block is marked :exports none because we do not want to display the code or the values. We just want the code to run when we export the file.

    #+begin_src python :session :exports none
length, width = 7, 13
#+end_src


The following almost does what we want [1].

    Area equals src_python[:session]{length*width}.

This renders as

Area equals 91.

if we export our org file to HTML The number 91 is typeset differently than the words before it. This would be more obvious if the computed value were a string rather than a number.

Org-mode is wrapping <code> tags around the computed result. If we were to export the org file to LaTeX it would wrap the result with \texttt{}. This is because, by default, the output of a computation is displayed as computer output, which is conventionally set in a monospace font like Courier. That’s fine in a technical document when we want to make it obvious that a calculation is a calculation, but typically not in a business context. You wouldn’t want, for example, to generate a letter that starts

Dear Michael,

with Michael’s name set in Courier, announcing that this is a form letter.

The fix is to add :results raw to the header session, the part in square brackets between src_python and the Python code.

    Area equals src_python[:session :results raw]{length*width}.

Now the calculation result is reported “raw”, i.e. without any special markup surrounding it.

***

[1] In this example I’m using Python, and so I used the function src_python. org-babel supports dozens of languages, and each has its src_<language> counterpart.

# Org-mode as a lightweight notebook

You can think of org-mode as simply a kind of markdown, a plain text file that can be exported to fancier formats such as HTML or PDF. It’s a lot more than that, but that’s a reasonable place to start.

Org-mode also integrates with source code. You can embed code in your file and have the code and/or the result of running the code appear when you export the file to another format.

## Org-mode as notebook

You can use org-mode as a notebook, something like a Jupyter notebook, but much simpler. An org file is a plain text file, and you can execute embedded code right there in your editor. You don’t need a browser, and there’s no hidden state.

Here’s an example of mixing markup and code:

    The volume of an n-sphere of radius r is

$$\frac{\pi^{\frac{n}{2}}}{\Gamma\left(\frac{n}{2} + 1\right)}r^n.$$

#+begin_src python :session
from scipy import pi
from scipy.special import gamma

def vol(r, n):
return pi**(n/2)*r**n/gamma(n/2 + 1)

vol(1, 5)
#+end_src


If you were to export the file to PDF, the equation for the volume of a sphere would be compiled into a image using LaTeX.

To run the code [1], put your cursor somewhere in the code block and type C-c C-c. When you do, the following lines will appear below your code.

    #+RESULTS:
: 5.263789013914324


If you think of your org-mode file as primary, and you’re just inserting some code as a kind of scratch area, an advantage of org-mode is that you never leave your editor.

## Jupyter notebooks

Now let’s compare that to a Jupyter notebook. Jupyter organizes everything by cells, and a cell can contain markup or code. So you could create a markup cell and enter the exact same introductory text [2].

    The volume of an n-sphere of radius r is

$$\frac{\pi^{\frac{n}{2}}}{\Gamma\left(\frac{n}{2} + 1\right)}r^n$$.


When you “run” the cell, the LaTeX is processed and you see the typeset expression rather than its LaTeX source. You can click on the cell to see the LaTeX code again.

Then you would enter the Python code in another cell. When you run the cell you see the result, much as in org-mode. And you could export your notebook to PDF as with org-mode.

## File diff

Now suppose we make a couple small changes. We want the n and r in the comment section set in math italic, and we’d like to find the volume of a 5-sphere of radius 2 rather than radius 1. We do this, in Jupyter and in org-mode [3], by putting dollar signs around the “n” and the “r”, and we change vol(1, 5) to vol(2, 5).

Let’s run diff on the two versions of the org-mode file and on the two versions of the Jupyter notebook.

The differences in the org files are easy to spot:

    1c1
< The volume of an n-sphere of radius r is
---
> The volume of an $$n$$-sphere of radius $$r$$ is
12c12
< vol(1, 5)
---
> vol(2, 5)
16c16,17
< : 5.263789013914324
---
> : 168.44124844525837


However, the differences in the Jupyter files are more complicated:

    5c5
<    "id": "2a1b0bc4",
---
>    "id": "a0a89fcf",
8c8
<     "The volume of an n-sphere of radius r is \n",
---
>     "The volume of an $n$-sphere of radius $r$ is \n",
15,16c15,16
<    "execution_count": 1,
<    "id": "888660a2",
---
>    "execution_count": 2,
22c22
<        "5.263789013914324"
---
>        "168.44124844525837"
25c25
<      "execution_count": 1,
---
>      "execution_count": 2,
37c37
<     "vol(1, 5)"
---
>     "vol(2, 5)"
43c43
<    "id": "f8d4d1b0",


There’s a lot of extra stuff in a Jupyter notebook. This is a trivial notebook, and more complex notebooks have more extra stuff. An apparently small change to the notebook can cause a large change in the underlying notebook file. This makes it difficult to track changes in a Jupyter notebook in a version control system.

## Related posts

[1] Before this will work, you have to tell Emacs that Python is one of the languages you want to run inside org-mode. I have the following line in my init file to tell Emacs that I want to be able to run Python, DITAA, R, and Perl.

    (org-babel-do-load-languages 'org-babel-load-languages '((python . t) (ditaa . t) (R . t) (perl . t)))

[2] Org-mode will let you use $ and $ to bracket LaTeX code for a displayed equation, and it will also let you use . Jupyter only supports the latter.

[3] In org-mode, putting dollar signs around variables sometimes works and sometimes doesn’t. And in this example, it works for the “r” but not for the “n”. This is very annoying, but it can be fixed by using $$ and $$ to enter and leave math mode rather than use a dollar sign for both.

# Complex AGM

The arithmetic-geometric mean (AGM) of two non-negative real numbers a and b is defined as the limit of the iteration starting with a0 = a and b0 = b and

an+1 = ½ (an + bn)
bn+1 = √(an bn)

for n > 0. This sequence converges very quickly and is useful in numerical algorithms. The limit can be expressed in terms of an elliptic function, and that elliptic function can then be related to other functions. See, for example, this post for how the AGM can be used to compute logarithms to arbitrary precision.

Since the AGM is useful in computing special functions, and we’re often interested in evaluating special functions at complex values, it’s natural to want to evaluate the AGM for complex numbers.

But we immediately run into a difficulty: which square root do we pick to find the new b?

For a non-negative real number x, √x is defined to be the non-negative real number y such that y² = x. But for more general values of x we have to choose a branch of the square root function. Which branch should we use in the AGM?

Often when we need to take the square root of complex numbers we can use the “principal branch,” the branch that gives positive values for positive real inputs and extends to the rest of the complex plane with the negative axis removed. If you compute the square root of a complex number in software, as we’ll do below, this is probably the value you’ll get by default.

But it turns out we cannot simply pick the principal branch. Gauss discovered two centuries ago that the right branch to take could vary at each step [0]. What does that even mean? How do we know we’ve made the “right” choice?

For one thing, we want our iteration to converge. And for another, we’d like it to converge to something non-zero if we start with non-zero inputs [1]. The right choice will guarantee this [2].

So what is this right choice? We provisionally update a and b as above, using either square root for b, and keep the value of b if

|ab| ≤ |a + b|

or

|ab| = |a + b|

and the imaginary part of b/a is positive. In other words, chose the possible value of b that’s closer to a, and use the imaginary part of b/a as a tie breaker.

Here is the AGM implemented in Python, using the right square root at each step.

    def right(a, b):
d = abs(a + b) - abs(a - b)
return d > 0 or d == 0 and (b/a).imag > 0

def agm(a, b):
while abs(a-b) > 1e-14:
a1 = 0.5*(a+b)
b1 = np.sqrt(a*b)
if not right(a1, b1):
b1 = -b1
a, b = a1, b1
return a1


The test d == 0 should make you concerned. Testing floating point numbers for exact equality with zero is seldom the right thing to do, but we’ll gloss over numerical issues for this post.

Here’s an example, giving the a values of the iteration starting with 7 + 30i and 20 + 22i. The iteration converges in four steps.

    13.500000000000000 + 26.000000000000000j
13.784944719026262 + 26.397404494892115j
13.783557503026870 + 26.395953326186888j
13.783557473769877 + 26.395953309190112j


In this example, the right root is the principal root every time. To find an example where the other root is chosen, I generated random starting points until I found one that took an alternate root.

Here are the values of b starting with -1.654 – 1.178i and 2.244 – 1.956i. An asterisk indicates that the principal root was not the right root.

     0.2328790598285062 - 0.728412421988127j
-0.6829999999999998 - 0.589000000000000j
-0.2254063569280081 - 0.799311791579126j *
-0.2250604700857468 - 0.658706210994063j
-0.2261796153095098 - 0.725905503054624j *
-0.2252334135068774 - 0.729009001286595j
-0.2257078598391289 - 0.727456168426402j *
-0.2257065144081936 - 0.727457252170610j
-0.2257071871240875 - 0.727456710298747j *
-0.2257071871236612 - 0.727456710298506j
-0.2257071871238743 - 0.727456710298626j *
-0.2257071871238744 - 0.727456710298626j


## More AGM posts

[0] David Cox. The Arithmetic-Geometric Mean of Gauss. L’Énseignement Mathématique, 30 (1984), p. 275–330.

[1] If a = –b we get zero immediately. But if a and b are both not zero, and a does not equal –b, then taking the right square root at each iteration gives us what we want.

[2] Interestingly, you could make a finite number of wrong choices and still end up with something that might converge, albeit to a different value. This gives you a different branch of the AGM.

# Gaussian elimination

When you solve systems of linear equations, you probably use Gaussian elimination, even if you don’t call it that. You may learn Gaussian elimination before you see it formalized in terms of matrices.

So if you’ve had a course in linear algebra, and you sign up for a course in numerical linear algebra, it’s natural to expect that Gaussian elimination would be one of the first things you talk about. That was my experience, and it was painful. I had this odd mixture of feeling like I already knew what the professor was talking about, while at the same time feeling lost.

Trefethen and Bau do not start their numerical linear algebra textbook with Gaussian elimination, and they explain why in the preface:

We have departed from the customary practice by not starting with Gaussian elimination. That algorithm is atypical of numerical linear algebra, exceptionally difficult to analyze, yet at the same time tediously familiar to every student in a course like this. Instead, we begin with the QR factorization, which is more important, less complicated, and fresher idea to most students.

It’s reassuring to hear experts in numerical linear algebra admit that Gaussian elimination is “exceptionally difficult to analyze.” The algorithm is not exceptionally difficult to perform; children learn to manually execute the algorithm. But it is surprisingly difficult, and tedious, to analyze.

Numerical analysts like Trefethen and Bau have much more sophisticated questions in mind than how you would naively carry out the algorithm by hand. Numerical analysts are concerned, for example, with stability.

If your coefficients are known exactly and you carry out your arithmetic exactly, then you get an exact result. But can a small change to your inputs, say due to measurement uncertainty, or the inevitable loss of precision from floating point arithmetic, make a large change in your results? Absolutely, unless you use pivoting. But with pivoting, Gaussian elimination is stable [1].

After taking an introductory linear algebra course, you know how to solve any linear system of equations in principle. But when you actually want to compute the solution accurately, efficiently, at scale, in a real computer, with real data, things can get much more interesting. Many people have devoted their careers to doing in practice what high school students know how to do in principle.

## Related posts

[1] There are matrices for which Gaussian elimination is unstable, but they are rare. It’s been shown that they are rare in the sense of having low probability, but more importantly they never arise naturally in applied problems.

# Sampling with replacement until you’ve seen everything

Suppose you have a standard deck of 52 cards. You pull out a card, put it back in the deck, shuffle, and pull out another card. How long would you expect to do this until you’ve seen every card?

Here’s a variation on the same problem. Suppose you’re a park ranger keeping data on tagged wolves in a national park. Each day you catch a wolf at random, collect some data, and release it. If there are 100 wolves in the park, how long until you’ve seen every wolf at least once?

We’ll solve this problem via simulation, then exactly.

## Simulation

The Python code to solve the problem via simulation is straight forward. Here we’ll simulate our ranger catching and releasing one of 100 wolves. We’ll repeat our simulation five times to get an idea how much the results vary.

    import numpy as np
from numpy.random import default_rng

rng = default_rng()
num_runs = 5
N = 100 # number of unique items

for run in range(num_runs):
tally = np.zeros(N,dtype=int)
trials = 0
while min(tally) == 0:
tally[rng.integers(N)] += 1
trials += 1
print(trials)


When I ran this, I got 846, 842, 676, 398, and 420.

## Exact solution

Suppose we have a desk of N cards, and we sample the deck with replacement M times. We can select the first card N ways. Ditto for the second, third, and all the rest of the cards. so there are NM possible sequences of card draws.

How many of these sequences include each card at least once? To answer the question, we look at Richard Stanley’s twelvefold way. We’re looking for the number of ways to allocate M balls into N urns such that each urn contains at least one ball. The answer is

N! S(M, N)

where S(M, N) is the Stirling number of the second kind with parameters M and N [1].

So the probability of having seen each card at least once after selecting M cards randomly with replacement is

N! S(M, N) / NM.

## Computing Stirling numbers

We’ve reduced our problem to the problem of computing Stirling numbers (of the second kind), so how do we do that?

We can compute Stirling numbers in terms of binomial coefficients as follows.

Now we have a practical problem if we want to use this formula for larger values of n and k. If we work with floating point numbers, we’re likely to run into overflow. This is the perennial with probability calculations. You may need to compute a moderate-sized number as the ratio of two enormous numbers. Even though the final result is between 0 and 1, the intermediate results may be too large to store in a floating point number. And even if we don’t overflow, we may lose precision because we’re working with an alternating sum.

The usual way to deal with overflow is to work on a log scale. But that won’t work for Stirling numbers because we’re adding terms together, not multiplying them.

One way to solve our problem—not the most efficient way but a way that will work—is to work with integers. This is easy in Python because integers by default can be arbitrarily large.

There are two ways to compute binomial coefficients in Python: scipy.special.binom and math.comb. The former uses floating point operations and the latter uses integers, so we need to use the latter.

We can compute the numerator of our probability with

    from math import comb
def k_factorial_stirling(n, k):
return sum((-1)**i * comb(k, i)*(k-i)**n for i in range(k+1))


Then our probability is

    k_factorial_stirling(M, N) / N**M

If we compute the probability that our ranger has seen all 100 wolves after catching and releasing 500 times, we get 0.5116. So the median number of catch and release cycles is somewhere around 500, which is consistent with our simulation above.

Note that the numerator and denominator in our calculation are both on the order of 101000 while the largest floating point number is on the order of 10308, so we would have overflowed if we’d used binom instead of comb.

## Related posts

[1] It’s unfortunate that these were called the “second kind” because they’re the kind that come up most often in application.