Pecunia non olet

I’ve been rereading That Hideous Strength. I’m going through it slowly this time, paying attention to details I glossed over before.

For example, early in the book we’re told that the head of a college has the nickname N.O.

N.O., which stood for Non-Olet, was the nickname of Charles Place, the warden of Bracton.

The first time I read the novel I zoomed past this tidbit. “Must be some Latin thing.” This time I looked it up.

It is indeed a Latin thing. It’s a reference to “Pecunia non olet” which translates as “Money doesn’t stink.” The idea is that money is money, and it doesn’t matter if it comes from a distasteful source.

The phrase goes back to the tax paid by those who bought the contents of public urinals as a source of ammonia. When Emperor Vespasian’s son Titus complained about the disgusting nature of the urine tax, the emperor held up a gold coin and said “Pecunia non olet.”

We’re told that the warden was “an elderly civil servant,” not an academic, and that his biggest accomplishment was that he had written “a monumental report on National Sanitation.”

So the nickname N.O. works on several levels. It implies that he’s willing to take money wherever he can get it, and it’s an allusion to the fact that he’s more qualified to be a sanitation engineer than a college president. I suppose it also implies that he’s inclined to say “no” to everything except money.

More posts on Latin phrases

Simple clinical trial of four COVID-19 treatments

A story came out in Science yesterday saying the World Health Organization is launching a trial of what it believes are the the four most promising treatments for COVID-19 (a.k.a. SARS-CoV-2, novel coronavirus, etc.)

The four treatment arms will be

  • Remdesivir
  • Chloroquine and hydroxychloroquine
  • Ritonavir + lopinavir
  • Ritonavir + lopinavir + interferon beta

plus standard of care as a control arm.

I find the design of this trial interesting. Clinical trials are often complex and slow. Given a choice in a crisis between ponderously designing the perfect clinical trial and flying by the seat of their pants, health officials would rightly choose the latter. On the other hand, it would obviously be good to know which of the proposed treatments is most effective. So this trial has to be a compromise.

The WHO realizes that the last thing front-line healthcare workers want right now is the added workload of conducting a typical clinical trial. So this trial, named SOLIDARITY, will be very simple to run. According to the Science article,

When a person with a confirmed case of COVID-19 is deemed eligible, the physician can enter the patient’s data into a WHO website, including any underlying condition that could change the course of the disease, such as diabetes or HIV infection. The participant has to sign an informed consent form that is scanned and sent to WHO electronically. After the physician states which drugs are available at his or her hospital, the website will randomize the patient to one of the drugs available or to the local standard care for COVID-19.

… Physicians will record the day the patient left the hospital or died, the duration of the hospital stay, and whether the patient required oxygen or ventilation, she says. “That’s all.”

That may sound a little complicated, but by clinical trial standards the SOLIDARITY trial is shockingly simple. Normally you would have countless detailed case report forms, adverse event reporting, etc.

The statistics of the trial will be simple on the front end but complicated on the back end. There’s no sophisticated algorithm assigning treatments, just a randomization between available treatment options, including standard of care. I don’t see how you could do anything else, but this will create headaches for the analysis.

Patients are randomized to available treatments—what else could you do? [1]—which means the treatment options vary by site and over time. The control arm, standard of care, also varies by site and could change over time as well.  Also, this trial is not double-blind. This is a trial optimized for the convenience of frontline workers, not for the convenience of statisticians.

The SOLIDARITY trial will be adaptive in the sense that a DSMB will look at interim results and decide whether to drop treatment arms that appear to be under-performing. Ideally there would be objective algorithms for making these decisions, carefully designed and simulated in advanced, but there’s no time for that. Better to start learning immediately than to spend six months painstakingly designing a trial. Even if we could somehow go back in time and start the design process six months ago, there could very well be contingencies that the designers couldn’t anticipate.

The SOLIDARITY trial is an expedient compromise, introducing a measure of scientific rigor when there isn’t time to be as rigorous as we’d like.

More clinical trial posts

[1] You could limit the trial to sites that have all four treatment options available, cutting off most potential sources of data. The data would not be representative of the world at large and accrual would be slow. Or you could wait until all four treatments were distributed to clinics around the world, but there’s no telling how long that would take.

Product of copulas

A few days ago I wrote a post about copulas and operations on them that have a group structure. Here’s another example of group structure for copulas. As in the previous post I’m just looking at two-dimensional copulas to keep things simple.

Given two copulas C1 and C2, you can define a sort of product between them by

(C_1 * C_2)(u,v) = \int_0^1 D_2C_1(u,t)\,\, D_1C_2(t,v) \,\, dt

Here Di is the partial derivative with respect to the ith variable.

The product of two copulas is another copula. This product is associative but not commutative. There is an identity element, so copulas with this product form a semigroup.

The identity element is the copula

M(u,v) = \min\{u, v\}

that is,

M * C = C * M = C

for any copula C.

The copula M is important because it is the upper bound for the Fréchet-Hoeffding bounds: For any copula C,

\max\{u+v-1, 0\}\leq C(u,v) \leq \min\{u, v\}

There is also a sort of null element for our semigroup, and that is the independence copula

\Pi(u,v) = uv

It’s called the independence copula because it’s the copula for two independent random variables: their joint CDF is the product of their individual CDFs. It acts like a null element because

\Pi * C = C * \Pi = \Pi

This tells us we have a semigroup and not a group: the independence copula cannot have an inverse.

Reference: Roger B. Nelsen. An Introduction to Copulas.

How to Set Num Lock on permanently

When I use my Windows laptop, I’m always accidentally brushing against the Num Lock key. I suppose it’s because the keys are so flat; I never have this problem on a desktop.

I thought there must be some way to set it so that it’s always on, so I searched for it. First I found articles on how to turn Num Lock on at startup, but that’s not my problem. The key already is on when I start up, but it doesn’t stay on when I brush against it.

Next I found articles say to set a certain registry key. That didn’t work for me, and apparently a lot of other people have the same experience.

Some articles say to edit your BIOS. My understanding is that you can edit the BIOS to permanently disable the key, but I wanted to permanently enable the key.

Here’s what did work: give AutoHotKey the command

    SetNumLockState, AlwaysOn

I haven’t used AutoHotKey before. I’ve heard good things it, but it seems like it can be quite a deep rabbit hole. I intend to look into it a little, but for now I just want my Num Lock to stay on.

After you install AutoHotKey and run it, you get its help browser, not the app per se, and it’s not immediately obvious how to run the code above. You need to save the line of code to a file whose name ends in .ahk, such as numlock.ahk. If you double-click on that file, it will run the AutoHotKey script. To make it run automatically when your computer starts up, put the script in your Startup folder. This is probably

    C:\Users\...\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup

You can bring up the Startup folder by typing Windows key + R, then shell:startup.

Related post: Remapping the Caps Lock key

New Asymptotic function in Mathematica 12.1

One of the new features in Mathematica 12.1 is the function Asymptotic. Here’s a quick example of using it.

Here’s an asymptotic series for the log of the gamma function I wrote about here.

\log \Gamma(z) \sim (z - \frac{1}{2}) \log z - z + \frac{1}{2} \log(2\pi) + \frac{1}{12z} - \frac{1}{360z^3} + \cdots

If we ask Mathematica

    Asymptotic[LogGamma[z], z -> Infinity]

we get simply the first term:

z Log[z]

But we can set the argument SeriesTermGoal to tell it we’d like more terms. For example

    Asymptotic[LogGamma[z], z -> Infinity, SeriesTermGoal -> 4]

-1/(360*z^3) + 1/(12*z) - z + Log[2*Pi]/2 - Log[z]/2 + z*Log[z]

This doesn’t contain a term 1/z4, but it doesn’t need to: there is no such term in the asymptotic expansion, so it is giving us the terms up to order 4, it’s just that the coefficient of the 1/z4 term is zero.

If we ask for terms up to order 5

    Asymptotic[LogGamma[z], z -> Infinity, SeriesTermGoal -> 5]

we do get a term 1/z5, but notice there is no 4th order term.

1/(1260*z^5) - 1/(360*z^3) + 1/(12*z) - z + Log[2*Pi]/2 - Log[z]/2 + z*Log[z]

A note on output forms

The Mathematica output displayed above was created by using

Export[filename, expression]

to save images as SVG files. The alt text for the images was created using


More Mathematica posts

Extended floating point precision in R and C

The GNU MPFR library is a C library for extended precision floating point calculations. The name stands for Multiple Precision Floating-point Reliable. The library has an R wrapper Rmpfr that is more convenient for interactive use. There are also wrappers for other languages.

It takes a long time to install MPFR and its prerequisite GMP, and so I expected it to take a long time to install Rmpfr. But the R library installs quickly, even on a system that doesn’t have MPFR or GMP installed. (I installed GMP and MPFR from source on Linux, but installed Rmpfr on Windows. Presumably the Windows R package included pre-compiled binaries.)

I’ll start by describing the high-level R interface, then go into the C API.


You can call the functions in Rmpfr with ordinary numbers. For example, you could calculate ζ(3), the Riemann zeta function evaluated at 3.

    > zeta(3)
    1 'mpfr' number of precision  128   bits
    [1] 1.202056903159594285399738161511449990768

The default precision is 128 bits, and a numeric argument is interpreted as a 128-bit MPFR object. R doesn’t have a built-in zeta function, so the only available zeta is the one from Rmpfr. If you ask for the cosine of 3, you’ll get ordinary precision.

    > cos(3)
    [1] -0.9899925

But if you explicitly pass cosine a 128-bit MPFR representation of the number 3 you will get cos(3) to 128-bit precision.

    > cos(mpfr(3, 128))                            
    1 'mpfr' number of precision  128   bits       
    [1] -0.9899924966004454572715727947312613023926

Of course you don’t have to only use 128-bits. For example, you could find π to 100 decimal places by multiplying the arctangent of 1 by 4.

    > 100*log(10)/log(2) # number of bits needed for 100 decimals                                               
    [1] 332.1928     
    >  4*atan(mpfr(1,333))                                                                                      
    1 'mpfr' number of precision  333   bits                                                                    
    [1] 3.14159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706807 

MPFR C library

The following C code shows how to compute cos(3) to 128-bit precision and 4 atan(1) to 333 bit precision as above.

    #include <stdio.h>
    #include <gmp.h>
    #include <mpfr.h>
    int main (void)
        // All functions require a rounding mode.
        // This mode specifies round-to-nearest
        mpfr_rnd_t rnd = MPFR_RNDN;
        mpfr_t x, y;
        // allocate unitialized memory for x and y as 128-bit numbers
        mpfr_init2(x, 128);
        mpfr_init2(y, 128);
        // Set x to the C double number 3
        mpfr_set_d(x, 3, rnd);
        // Set y to the cosine of x
        mpfr_cos(y, x, rnd);
        // Print y to standard out in base 10
        printf ("y = ");
        mpfr_out_str (stdout, 10, 0, y, rnd);
        putchar ('\n');
        // Compute pi as 4*atan(1)
        // Re-allocate x and y to 333 bits
        mpfr_init2(x, 333);    
        mpfr_init2(y, 333);    
        mpfr_set_d(x, 1.0, rnd);        
        mpfr_atan(y, x, rnd);
        // Multiply y by 4 and store the result back in y
        mpfr_mul_d(y, y, 4, rnd);
        printf ("y = ");
        mpfr_out_str (stdout, 10, 0, y, rnd);
        putchar ('\n');
        // Release memory
        return 0;

If this code is saved in the file hello_mpfr.c then you can compile it with

    gcc hello_mpfr.c -lmpfr -lgmp

One line above deserves a little more explanation. The second and third arguments to mpfr_out_str are the base b and number of figures n to print.

We chose b=10 but you could specify any base value 2 ≤ b ≤ 62.

If n were set to 100 then the output would contain 100 significant figures. When n=0, MPFR will determine the number of digits to output, enough digits that the string representation could be read back in exactly. To understand how many digits that is, see Matula’s theorem in the previous post.

When is round-trip floating point radix conversion exact?

Suppose you store a floating point number in memory, print it out in human-readable base 10, and read it back in. When can the original number be recovered exactly?

D. W. Matula answered this question more generally in 1968 [1].

Suppose we start with base β with p places of precision and convert to base γ with q places of precision, rounding to nearest, then convert back to the original base β. Matula’s theorem says that if there are no positive integers i and j such that

βi = γj

then a necessary and sufficient condition for the round-trip to be exact (assuming no overflow or underflow) is that

γq-1 > βp.

In the case of floating point numbers (type double in C) we have β = 2 and p = 53. (See Anatomy of a floating point number.) We’re printing to base γ = 10. No positive power of 10 is also a power of 2, so Matula’s condition on the two bases holds.

If we print out q = 17 decimal places, then

1016 > 253

and so round-trip conversion will be exact if both conversions round to nearest. If q is any smaller, some round-trip conversions will not be exact.

You can also verify that for a single precision floating point number (p = 24 bits precision) you need q = 9 decimal digits, and for a quad precision number (p = 113 bits precision) you need q = 36 decimal digits [2].

Looking back at Matula’s theorem, clearly we need

γq ≥ βp.

Why? Because the right side is the number of base β fractions and the left side is the number of base γ fractions. You can’t have a one-to-one map from a larger space into a smaller space. So the inequality above is necessary, but not sufficient. However, it’s almost sufficient. We just need one more base γ figure, i.e. we Matula tells us

γq-1 > βp

is sufficient. In terms of base 2 and base 10, we need at least 16 decimals to represent 53 bits. The surprising thing is that one more decimal is enough to guarantee that round-trip conversions are exact. It’s not obvious a priori that any finite number of extra decimals is always enough, but in fact just one more is enough; there’s no “table maker’s dilemma” here.

Here’s an example to show the extra decimal is necessary. Suppose p = 5. There are more 2-digit numbers than 5-bit numbers, but if we only use two digits then round-trip radix conversion will not always be exact. For example, the number 17/16 written in binary is 1.0001two, and has five significant bits. The decimal equivalent is 1.0625ten, which rounded to two significant digits is 1.1ten. But the nearest binary number to 1.1ten with 5 significant bits is 1.0010two = 1.125ten. In short, rounding to nearest gives

1.0001two -> 1.1ten -> 1.0010two

and so we don’t end up back where we started.

More floating point posts

[1] D. W. Matula. In-and-out conversions. Communications of the ACM, 11(1):47–50. January 1968. Cited in Handbook of Floating-point Arithmetic by Jean-Mihel Muller et al.

[2] The number of bits allocated for the fractional part of a floating point number is 1 less than the precision: the leading figure is always 1, so IEEE formats save one bit by not storing the leading bit, leaving it implicit. So, for example, a C double has 53 bits precision, but 52 bits of the 64 bits in a double are are allocated to storing the fraction.

Group symmetry of copula operations

You don’t often see references to group theory in a statistics book. Not that there aren’t symmetries in statistics that could be described in terms of groups, but this isn’t often pointed out.

Here’s an example from An Introduction to Copulas by Roger Nelsen.

Show that under composition the set of operations of forming the survival copula, the dual of a copula, and the co-copula of a given copula, along with the identity (i.e., ^, ~, *, and i) yields the dihedral group.

Nelsen gives the following multiplication table for copula operations.

    o | i ^ ~ *
    i | i ^ ~ *
    ^ | ^ i * ~
    ~ | ~ * i ^
    * | * ~ ^ i

The rest of this post explains what a copula is and what the operations above are.

What is a copula?

At a high level, a copula is a mathematical device for modeling the dependence between random variables. Sklar’s theorem says you can express the joint distribution of a set of random variables in terms of their marginal distributions and a copula. If the distribution functions are continuous, the copula is unique.

The precise definition of a copula is technical. We’ll limit ourselves to copulas in two dimensions to make things a little simpler.

Let I be the unit interval [0, 1]. Then a (two-dimensional) copula is a function from I × I to I  that satisfies

\begin{align*} C(0, v) &= 0\\ C(u, 0) &= 0\\ C(u, 1) &= u\\ C(1, v) &= v \end{align*}

and is 2-increasing.

The idea of a 2-increasing function is that “gradients point northeast.” Specifically, for all points (x1, y1) and (x2, y2) with x1x2 and y1y2, we have

C(x_2, y_2) - C(x_2, y_1) - C(x_1, y_2) + C(x_1, y_1) \,\geq\, 0

The definition of copula makes no mention of probability, but the 2-increasing condition says that C acts like the joint CDF of two random variables.

Survival copula, dual copula, co-copula

For a given copula C, the corresponding survival copula, dual copula, and co-copula are defined by

\begin{align*} \hat{C}(u, v) &= u + v - 1 + C(1-u, 1-v) \\ \tilde{C}(u, v) &= u + v - C(u,v) \\ C^*(u,v) &= 1 - C(1-u, 1-v) \end{align*}


The reason for the name “survival” has to do with a survival function, i.e. complementary CDF of a random variable. The survival copula is another copula, but the dual copula and co-copulas aren’t actually copulas.

This post hasn’t said much too about motivation or application—that would take a lot more than a short blog post—but it has included enough that you could verify that the operations do compose as advertised.

Update: See this post for more algebraic structure for copulas, a sort of convolution product.

Product of Chebyshev polynomials

Chebyshev polynomials satisfy a lot of identities, much like trig functions do. This point will look briefly at just one such identity.

Chebyshev polynomials Tn are defined for n = 0 and 1 by

T0(x) = 1
T1(x) = x

and for larger n using the recurrence relation

Tn+1(x) = 2xTn(x) – Tn-1(x)

This implies

T2(x) = 2xT1(x) – T0(x) = 2x2 – 1
T3(x) = 2xT2(x) – T1(x) = 4x3 – 3x
T4(x) = 2xT3(x) – T2(x) = 8x4 – 8x2 + 1

and so forth.

Now for the identity for this post. If mn, then

2 Tm Tn  = Tm+n  + Tmn.

In other words, the product of the mth and nth Chebyshev polynomials is the average of the (m + n)th and (mn)th Chebyshev polynomials. For example,

2 T3(x) T1(x) = 2 (4x3 – 3x) x = T4(x) + T2(x)

The identity above is not at all apparent from the recursive definition of Chebyshev polynomials, but it follows quickly from the fact that

Tn(cos θ) = cos nθ.

Proof: Let θ = arccos x. Then

2 Tm(x) Tn(x)
= 2 Tm(cos θ) Tn(cos θ)
= 2 cos mθ cos nθ
= cos (m+n)θ + cos (mn
= Tm+n(cos θ)  + Tmn(cos θ)
= Tm+n(x)  + Tmn(x)

You might object that this only shows that the first and last line are equal for values of x that are cosines of some angle, i.e. values of x in [-1, 1]. But if two polynomials agree on an interval, they agree everywhere. In fact, you don’t need an entire interval. For polynomials of degree m+n, as above, it is enough that they agree on m + n + 1 points. (Along those lines, see Binomial coefficient trick.)

The close association between Chebyshev polynomials and cosines means you can often prove Chebyshev identities via trig identities as we did above.

Along those lines, we could have taken

Tn(cos θ) = cos nθ

as the definition of Chebyshev polynomials and then proved the recurrence relation above as a theorem, using trig identities in the proof.

Forman Acton suggested in this book Numerical Methods that Work that you should think of Chebyshev polynomials as “cosine curves with a somewhat disturbed horizontal scale.”

The Brothers Markov

The Markov brother you’re more likely to have heard of was Andrey Markov. He was the Markov of Markov chains, the Gauss-Markov theorem, and Markov’s inequality.

Andrey had a lesser known younger brother Vladimir who was also a mathematician. Together the two of them proved what is known as the Markov Brothers’ inequality to distinguish it from (Andrey) Markov’s inequality.

For any polynomial p(x) of degree n, and for any non-negative integer k, the maximum of the kth derivative of p over the interval [-1, 1] is bounded by a constant times the maximum of p itself. The constant is a function of k and n but is otherwise independent of the particular polynomial.

In detail, the Markov Brothers’ inequality says

\max_{-1\leq x \leq 1} |p^{(k)}(x)|\,\, \leq \prod_{0 \leq j < k} \frac{n^2 - j^2}{2j+1} \,\max_{-1\leq x \leq 1}|p (x)|

Andrey proved the theorem for k = 1 and his brother Vladimir generalized it for all positive k.

The constant in the Markov Brothers’ inequality is the smallest possible because the bound is exact for Chebyshev polynomials [1].

Let’s look at an example. We’ll take the second derivative of the fifth Chebyshev polynomial.

T5(x) = 16x5 – 20x3 + 5x.

The second derivative is

T5”(x) = 320x3 – 120x.

Here are their plots:

T5 and its second derivative

The maximum of T5(x) is 1 and the maximum of its second derivative is 200.

The product in the Markov Brothers’ inequality with n = 5 and k = 2 works out to

(25/1)(24/3) = 200

and so the bound is exact for p(x) = T5(x).


It took a while for westerners to standardize how to transliterate Russian names, so you might see Andrey written as Andrei or Markov written as Markoff.

There were even more ways to transliterate Chebyshev, including Tchebycheff, Tchebyshev, and Tschebyschow. These versions are the reason Chebyshev polynomials [1] are denoted with a capital T.

More posts mentioning Markov

[1] There are two families of Chebyshev polynomials. When used without qualification, as in this post, “Chebyshev polynomial” typically means Chebyshev polynomial of the first kind. These are denoted Tn. Chebyshev polynomials of the second kind are denoted Un.