Sigmas and Student

I saw something yesterday saying that the Japanese bond market had experienced a six standard deviation move. This brought to mind a post I’d written eight years ago.

All probability statements depend on a model. And if you’re probability model says an event had a probability six standard deviations from the mean, it’s more likely that your model is wrong than that you’ve actually seen something that rare. I expand on this idea here.

How likely is it that a sample from a random variable will be six standard deviations from its mean? If you have in mind a normal (Gaussian) distribution, as most people do, then the probability is on the order of 1 chance in 10,000,000. Six sigma events are not common for any distribution, but they’re not unheard of for distributions with heavy tails.

Let X be a random variable with a Student t distribution and ν degrees of freedom. When ν is small, i.e. no more than 2, the tails of X are so fat that the standard deviation doesn’t exist. As ν → ∞ the Student t distribution approaches the normal distribution. So in some sense this distribution interpolates between fat tails and thin tails.

What is the probability that X takes on a value more than six standard deviations from its mean at 0, i.e. what does the function

f(ν) = Prob(X > 6σ)

look like as a function of ν where σ² = ν/(ν − 2) is the variance of X?

As you’d expect, the limit of f(ν) as ν → ∞ is the probability of a six-sigma event for a normal distribution, around 10−7 as mentioned above. Here’s a plot of f(ν) for ν > 3. Notice that the vertical axis is on a log scale, i.e. the probability decreases exponentially.

What you might not expect is that f(ν) isn’t monotone. It rises to a maximum value before it decays exponentially. In hindsight this makes sense. As ν → 2+ the variance becomes infinite, and the probability of being infinitely far from the mean is 0. Here’s a plot of f(ν) between 2 and 3.

So six sigma probabilities for a Student t distribution rise from 0 up to a maximum of around 10−3 then decrease exponentially, then asymptotically approach a value around 10−7.

Related posts

Stylometry

I was reading an article this morning that mentioned a stylometric analysis of a controversial paragraph written by Roman historian Flavius Josephus. I’ve written several posts that could be called stylometry or adjacent, but I haven’t used that word. Here are some posts that touch on the statistical analysis of a text or of an author.

Two cheers for ugly code

Ugly code may be very valuable, depending on why it’s ugly. I’m not saying that it’s good for code to be ugly, but that code that is already ugly may be valuable.

Some of the ugliest code was started by someone who knew the problem domain well but did not know how to write maintainable code. It may implicitly contain information that is not explicitly codified anywhere else. It may contain information the original programmer isn’t even consciously aware of. It’s often easier to clean up the code than to surface the information it contains using any other source.

Another way code gets ugly is undisciplined modification by multiple programmers over a long period of time. In that case the code has proved to be useful. It’s the opposite of Field of Dreams code that you hope will be used if you build it.

Working effectively with legacy code is hard. It may be easier, and certainly more pleasant, to start from scratch. Even so, there may be more to learn from the old code than is immediately obvious.

***

This post was motivated by looking to extend some code I use in my business. It wouldn’t win a beauty pageant, but it’s very useful.

Writing this post reminded me of a post Productive Productivity that I wrote a while back. From that post:

The scripts that have been most useful are of zero interest to anyone else because they are very specific to my work. I imagine that’s true of most scripts ever written.

Prime gaps and Gapcoin

The previous post looked at tightly clustered primes. This post looks at the opposite, large gaps between primes.

Riecoin is a cryptocurrency that uses finding prime clusters as its proof of work task. Gapcoin uses finding prime gaps as its proof of work task.

There’s some nuance to defining prime gaps. It’s trivial to produce a gap of any size. For example, [n! + 2, n! + n] is an interval of length n − 1 that contains no primes. It is more interesting to find gaps that are large relative to the size of the endpoints. The merit of a gap is the ratio of the gap length to the log of the initial number in the interval.

To be specific, suppose p and q are consecutive primes. The length of the gap between them is defined to be qp and the merit of that gap is (qp) / log p. For large p, the average gap between primes near p is log p and so the merit function measures how large the gap is relative to what you would expect for the size of p.

The following code will compute the merit function.

>>> from sympy import nextprime, log, N
>>> merit = lambda p: (nextprime(p) - p)/log(p)

Gapcoin adjusts its mining difficulty by adjusting the minimum merit value the miner must search for. Gapcoin miners must find a prime p of the form

ph × 2a + b

where h is the SHA256 hash of the previous block in the blockchain and b < 2a.

The prime gap with the largest known merit is [pp + 8350] where

p = 293703234068022590158723766104419463425709075574811762098588798217895728858676728143227

The code

>>> N(merit(p))

shows that the merit is 41.94.

This record was found by the Gapcoin network. I don’t know the backstory, but I presume the mining task wasn’t to find a world record gap. Instead, the miner got lucky and found a much larger gap than necessary.

Related posts

Prime clusters and Riecoin

Prime clusters are sets of primes that appear as close together as is generally possible.

There is one pair of consecutive prime numbers, 2 and 3, but there cannot be any more: in any larger pair of consecutive numbers, one of the pair will be even. But there are a lot of twin primes, perhaps infinitely many, and so a prime cluster of size two is a pair of primes whose difference is 2.

How close together can a set of three primes be? The set {2, 3, 5} has diameter 3, i.e. the difference between the largest and smallest element is 3. And the set {3, 5, 7} has diameter 4. But in general the diameter of a set of three primes must be at least 6. If the smallest element is bigger than 3, then all the elements are odd, but they cannot be consecutive odd numbers or else one of them would be divisible by 3. But there are many prime clusters of diameter 6. For example, {13, 17, 19} and {37, 41, 43}.

Formal definition and motivation

There is some fuzziness in the discussion above regarding what is generally possible. This section will make our definitions more rigorous.

In general a pair of primes cannot be consecutive numbers because one of the pair must be even. Stated more abstractly, every pair of integers larger than {2, 3} will contain a complete residue class mod 2, i.e. one of the numbers will be congruent to 0 mod 2 and one of the numbers will be congruent to 1 mod 2.

Now let’s look at sets of three primes. The example {2, 3, 5} is exceptional because it contains a complete residue class mod 2, i.e. it contains even and odd numbers. The example {3, 5, 7} is exceptional because it contains a complete residue class mod 3.

We say that a set of k primes is a cluster if it does not contain a full residue class modulo any prime. We could say it does not contain a full residue class modulo any prime qk because trivially no set of k elements can have set of more than k elements. For example, the cluster {13, 17, 19} does not contain a full set of residues mod 2 or 3, but it also does not have a full set of residues mod 5, 7, 11, ….

We say a cluster is maximally dense if it has the minimum diameter for a cluster of its number of primes.

Informally, we will call a maximally dense cluster simply a cluster. A maximally dense prime cluster is also sometimes called a prime constellation.

Riecoin cryptocurrency

I’ve written several posts about the Primecoin cryptocurrency whose proof of work task is finding prime chains. The Riecoin cryptocurrency requires miners to find prime clusters.

Primecoin is far from a major cryptocurrency, with a market cap of around $2.3M. Riecoin is about four times smaller than Primecoin. There is another number-theoretic cryptocurrency, Gapcoin, whose market cap is about 10x smaller than that of Riecoin. Safe to say these three projects are of more interest to mathematicians than investment firms. All three of these prime-based cryptocurrencies were launched between 2013 and 2014.

A proof of work task should satisfy three criteria:

  1. Solutions should be difficult to find but easy to confirm.
  2. The time to find a solution should be roughly predictable.
  3. The difficulty should be adjustable.

Finding prime clusters requires a brute force search, but it’s easy to test whether a cluster has been found, and so the first property is satisfied.

The Hardy-Littlewood conjectures give an estimate of the difficulty in finding prime clusters of a given length. The difficulty can be adjusted by adjusting the required length. So the Hardy-Littlewood conjectures assist in satisfying the second and third properties. It does not matter if the conjectures are false somewhere out in asymptopia; they empirically give good estimates for the size of numbers used in mining Riecoin.

More about clusters

The definition of a maximally dense cluster depends on a function s(k) that says what the diameter of a cluster of k primes would need to be. We’ve said s(2) = 2 and s(3) = 6. More values of s are listed on the page for OEIS sequence A008407.

Related posts

Efficiently testing multiple primes at once

The previous post looked at a technique for inverting multiple integers mod m at the same time, using fewer compute cycles than inverting each integer individually. This post will do something analogous for prime chains, revisiting a post from a few days ago about testing prime chains.

A prime chain is a sequence of primes in which each is twice its predecessor, plus or minus 1. In a Cunningham chain of the first kind, it’s always plus, and in a Cunningham chain of the second kind, it’s always minus.

Primecoin is a cryptocurrency that uses finding prime chains as its proof-of-work (PoW) task. The miner has a choice of finding one of three kinds of prime chain: a Cunningham chain of the first or second kind, or a bi-twin chain. The length of the necessary chain varies over time to keep the difficulty relatively constant. Other PoW blockchains do something similar.

Some people say that Primecoin has miners search for primes for PoW. That’s not quite right. Miners have to find a chain of medium-sized primes rather than finding one big prime. This leads to more predictable compute times.

There is a way to test a candidate Cunningham chain of the second kind all at once. Henri Lifchitz gives his algorithm here. Given a sequence of numbers

n1, n2, n3, …, nk

where ni = 2ni−1 − 1 for each i and  n0 = 1 mod 4, all the numbers in the sequence are probably prime if

2^{n_{k-1} - 1} = 1 \bmod n_0 n_1 n_2 \cdots n_k

For example, consider the chain

31029721, 62059441, 124118881

Note that 31029721 mod 4 = 1 and 31029721 = 2*15514861 − 1. The following code demonstrates that the numbers in the chain are probable primes because it prints 1.

n0 = 15514861
n1 = 2*n0 - 1
n2 = 2*n1 - 1
n3 = 2*n2 - 1
prod = n0*n1*n2*n3
print( pow(2, n2 - 1, prod) )

Next I wanted to try the algorithm on much larger numbers where its efficiency would be more apparent, as in the previous post. But when I did, the test returned a result other than 1 on a known Cunningham chain of the second kind. For example, when I change the first two lines of code above to

n1 = 49325406476*primorial(9811, False) + 1
n0 = (n1 + 1) // 2

the code returns a large result. I verified that each of the numbers in the chain are prime using Sympy’s isprime function.

Usually a probable prime test can have false positives but never a false negative. I haven’t looked at Lifschitz method closely enough to tell whether it can have false negatives, but the code above suggests it can.

Tighter bounds in the prime number theorem

The most elementary form of the prime number theorem says that π(x), the number of prime numbers less than x, is asymptotically equal to x / log(x). That’s true, but a more accurate result says π(x) is asymptotically equal to li(x) where

\text{li}(x) = \int_0^x \frac{dt}{\log t}

Five years ago I wrote about a result that was new at the time, giving a bound on |π(x) − li(x)| for x > exp(2000). This morning I saw a result in a blog post by Terence Tao that says

\left| \pi(x) - \text{li}(x) \right| \leq 9.2211\, x\sqrt{\log(x)} \exp\left( -0.8476 \sqrt{\log(x)} \right)

for all x ≥ 2. The result comes from this paper.

The new bound has the same form as the bound from five years ago but with smaller constants.

Efficiently computing multiple modular inverses at once

Suppose you have a large prime number M and you need to find the inverse of several numbers mod M.  Montgomery’s trick is a way to combine the computation of the inverses to take less time than computing the inverses individually. Peter Montgomery (1947–2020) came up with this trick in 1985.

We will illustrate Montgomery’s trick by inverting three numbers—a, b, and c—though the trick extends to any number of numbers. It is commonly used in cryptography.

Modular inverses are much slower to calculate than modular products, so doing fewer of the former and more of the latter is a good tradeoff. Montgomery’s method only calculates one modular inverse, regardless of how many numbers need to be inverted.

The idea is to directly invert the product of all the numbers and use multiplication to find the inverses of the individual numbers. In our case, we compute

xab
y = cy = abc
x−1 = cy−1
b−1 = ax−1
a−1 = bx−1

To show that this actually saves time, we’ll run some Python code to invert three random numbers modulo a very large prime, much larger than occurs in practice. The reason is to make the computation time longer and easier to demonstrate. In practice, Montgomery’s trick saves a little time off of a lot of calculations. Here we’ll save a lot of time off a handful of calculations.

import sys
import time
from secrets import randbelow

# extend the default maximum integer size
sys.set_int_max_str_digits(100000)

# the 32nd Mersenne prime
M = 2**756839 - 1

def simple(a, b, c, M):
    return [pow(x, -1, M) for x in [a, b, c]]

def montgomery(a, b, c, M):
    x = a*b % M
    y = x*c % M
    yinv = pow(y, -1, M)
    cinv = x*yinv % M
    xinv = c*yinv % M
    binv = a*xinv % M
    ainv = b*xinv % M
    return [ainv, binv, cinv]
    
a = randbelow(M)
b = randbelow(M)
c = randbelow(M)

start = time.perf_counter()
result = simple(a, b, c, M)
elapsed = time.perf_counter() - start
print(elapsed)

start = time.perf_counter()
result = montgomery(a, b, c, M)
elapsed = time.perf_counter() - start
print(elapsed)

When we ran this, the direct approach took 121.8 seconds, and Montgomery’s trick took 47.6 seconds.

Related posts

The middle binomial coefficient

The previous post contained an interesting observation:

\sqrt{52!} \approx 26! \, 2^{26}

Is it true more generally that

\sqrt{(2n)!} \approx n! \, 2^n

for large n? Sorta, but the approximation gets better if we add a correction factor.

If we square both sides of the approximation and move the factorials to one side, the question becomes whether

\frac{(2n)!}{(n!)^2} = \binom{2n}{n} \approx 4^n

Now the task becomes to estimate the middle coefficient in when we apply the binomial theorem to (xy)2n.

A better approximation for the middle binomial coefficient is

\binom{2n}{n} \approx \frac{4^n}{\sqrt{\pi n}}

Now the right hand side is the first term of an asymptotic series for the left. The ratio of the two sides goes to 1 as n → ∞.

We could prove the asymptotic result using Stirling’s approximation, but it’s more fun to use a probability argument.

Let X be a binomial random variable with distribution B(2n, 1/2). As n grows, X converges in distribution to a normal random variable with the same mean and variance, i.e. with μ = n and σ² = n/2. This says for large n,

\text{Prob}(X = 1/2) = \binom{2n}{n} 4^{-n} \approx \frac{1}{\sqrt{2\pi\sigma^2}} = \frac{1}{\sqrt{\pi n}}

The argument above only gives the first term in the asymptotic series for the middle coefficient. If you want more terms in the series, you’ll need to use more terms in Stirling’s series. If we add a couple more terms we get

\binom{2n}{n} = \frac{4^n}{\sqrt{\pi n}} \left(1 - \frac{1}{8n} + \frac{1}{128n^2} + {\cal O}\left(\frac{1}{n^3}\right) \right)

Let’s see how much accuracy we get in estimating 52 choose 26.

from scipy.special import binom
from numpy import pi, sqrt

n = 26
exact = binom(2*n, n)
approx1 = 4**n/sqrt(pi*n)
approx2 = approx1*(1 - 1/(8*n))
approx3 = approx1*(1 - 1/(8*n) + 1/(128*n**2))

for a in [approx1, approx2, approx3]:
    print(exact/a)

This prints

0.9952041409266293
1.0000118903997048
1.0000002776131290

and so we see substantial improvement from each additional term. This isn’t always the case with asymptotic series. We’re guaranteed that for a fixed number of terms, the relative error goes to zero as n increases. For a fixed n, we do not necessarily get more accuracy by including more terms.

Related posts

Combining in-shuffles and out-shuffles

A few days ago I wrote two posts about perfect shuffles. Once you’ve cut a deck of cards in half, an in-shuffle lets a card from the top half fall first, and an out-shuffle lets a card from the bottom half fall first.

Suppose we have a deck of 52 cards. We said in the earlier posts that the order of an in-shuffle I is 52. That is, after 52 in-shuffles, a deck returns to its initial order. And the order of an out-shuffle O is 8.

We can think of I and O as generators of subgroups of order 52 and 8 respectively in the group S of all permutations of 52 cards. I was curious when I wrote the earlier posts how large the group generated by I and O together would be. Is it possible to reach all 52! permutations of the deck by some combination of applying I and O? If not, how many permutations can be generated?

I’ve since found the answer in [1] in a theorem by Diaconis, Graham, and Kantor. I don’t know who Kantor is, but it’s no surprise that a theorem on card shuffles would come from Persi Diaconis and Ron Graham. The theorem covers the case for decks of size N = 2n, which branches into different results depending on the size of n and the value of n mod 4.

For N = 52, the group generated by I and O has

26! × 226

elements.

On the one hand, that’s a big number, approximately 2.7 × 1034. On the other hand, it’s quite small compared to 52! = 8 × 1067. So while there are a lot of permutations reachable by a combination of in-shuffles and out-shuffles, your chances of selecting such a permutation from the set of all such permutations is vanishingly small.

To put it yet another way, the number of arrangements is on the order of the square root of 52!, a big number, but not big relative to 52!. (Does this pattern

√52! ≈ 26! × 226

generalize? See the next post.)

Not only does the theorem of Diaconis et al give the order of the group, it gives the group itself: the group of permutations generated by I and O is isomorphic to the group of symmetries of a 26-dimensional octahedron.

[1] S. Brent Morris. Magic Tricks, Card Shuffling and Dynamic Computer Memories. MAA 1998.