Prime chains

The title of this post has a double meaning. We will look at chains in the sense of number theory and in the sense of cryptocurrency, i.e. Cunningham chains and blockchains, that involve prime numbers.

Cunningham chains

A chain of primes is a sequence of prime numbers in which each is almost double its predecessor. That is, the next number after p is 2p ± 1.

In a Cunningham chain of the first kind, the successor of p is 2p + 1. For example,

41, 83, 167.

In a Cunningham chain of the second kind, the successor of p is 2p − 1. For example,

19, 37, 73.

Two questions come up immediately. First, are there infinitely many Cunningham chains? Second, how long can a Cunningham chain be? What is known and what is conjectured are at opposite ends of the spectrum. It is unknown whether there are infinitely many Cunningham chains of length 2, but it is conjectured that there are infinitely many Cunningham chains of all lengths.

According to this page, the longest known Cunningham chains of the first kind has length 17, and the longest known Cunningham chain of the second kind has length 19. We can verify these results with the following Python code.

from sympy import isprime

def chain_length(start, kind):
    p = start
    c = 0
    while isprime(p):
        c += 1
        p = 2*p + kind
    return c

print(chain_length(2759832934171386593519, 1))
print(chain_length(79910197721667870187016101, -1))

Bi-twin chains

A number n is the basis of a bi-twin chain of length k if n − 1 is the start of a Cunningham chain of the first kind of length k and n + 1 is the start of a Cunningham chain of the second kind of length k.

I say more about bi-twin prime chains in the next post.

Primecoin

Primecoin was one of the first cryptocurrencies, coming out four years after Bitcoin. Primecoin is still going, though its market cap is six orders magnitude smaller than that of Bitcoin.

What’s interesting about Primecoin is that it uses finding prime chains as its proof of work task [1]. To mint a new Primecoin block, you must find a prime chain of the required length whose origin a multiple of the hash of the block header [2].

Primecoin allows any of the three kinds of prime chains mentioned above: Cunningham chains of the first or second kind, or a bi-twin prime chain. Primecoin adjusts its mining difficulty over time by varying the length of Cunningham or bi-twin chain needed to mint a new block.

There are also cryptocurrencies based on finding prime clusters and prime gaps.

Related posts

[1] Strictly speaking, Primecoin requires finding probable prime chains, as explained here.

[2] The origin of a prime chain is n if the first item in a Cunningham chain of the first kind is n + 1, or if the first item in a Cunningham chain of the first kind or a bi-twin chain is n − 1.

Compressing a set of hash values

Suppose you have a set of k hash values, each n bits long. Can you compress the set into less than kn bits?

It’s not possible to compress a list of hashes into less than kn bits, but you can hash a set into fewer bits.

Suppose you have a set of 230, roughly a billion, 64-bit hashes. Sort the set and look at the size of gaps between elements. You might expect that consecutive items on the list are roughly 234 apart, and so you could reconstruct the list by reporting the first item and the gaps between the rest, which are 34-bit numbers, not 64-bit numbers, a savings of 30 bits per hash.

This doesn’t exactly work, but it’s the kernel of a good idea. We don’t know that the distance between hashes can be represented by a 34 bit number. The gap could be more or less than 234, but we don’t expect it to often be much more than 234. So we use a variable-length encoding such that when the distance between values is on the order of 234, or less, we save bits, but we allow for the distance to be any size.

Applications

What we’ve described is the essence of Golomb-Rice coding. Its implementation in the Bitcoin protocol is referred to as Golomb-Coded Sets (GCS), described in BIP 158. Golomb-Rice coding is also used other applications where the  values to be compressed are not hashes, such as in lossless auto compression.

Encoding

Let’s go into some detail as to how the distances between sorted values are represented. Suppose you expect the differences to be on the order of M where M is a power of 2. For each difference d, let q and r be the quotient and remainder by M, i.e.

dqMr.

Encode q as a unary number, i.e. string of q 1s, and encode r as an ordinary binary number. Then Golomb-Rice coding of d is the concatenation of the representations of q and r. with a 0 in the middle as a separator. Using || to denote string concatenation we have

unary(q)  ||  0  ||  binary(r).

In general, unary encoding is extremely inefficient, but we’re betting that q will typically be quite small.

The reason we require M to be a power of 2 is so the representation of r will have a fixed length [1].

Let’s work out an example. Suppose M = 220 and

d = 222 + 123456 = 4 × M + 123456.

Then we write q as 1111 and r as 0011110001001000000 and encode d as the string

111100011110001001000000

Decoding

You can concatenate the encodings of consecutive d values and still be able to unambiguously decode the result. Because the r representations have a constant length, you know when an r ends and the next q begins.

For example, suppose we have the following string of bits.

1110010011001011001011111001000000110010001110011101111000101111011

We decode the string from left to right. We count how many ones we see, skip over a 0, then regarding the next 20 bits as a binary number.

111 0 01001100101100101111 1 0 01000000110010001110 0 11101111000101111011

We see three 1s before the first 0 and so we conclude q1 = 3. We skip over the 0 and read the value of r.

r1 = 01001100101100101111two = 314159.

Next we see a single 1 before the next 0, so q2 = 1. We read the next value of r as

r2 =  01000000110010001110two = 265358.

Next we see a 0, i.e. there are no 1s, and so the final q3 = 0. And we have

r3 =  11101111000101111011two = 979323.

So our reconstructed values of d are

d1 = q1 M+ r1 = 3 × 220 + 314159 = 3459887

d2 = q2 M+ r2 = 1 × 220 + 265358 = 1313934

d3 = q3 M+ r3 = 0 × 220 + 979323 = 979323.

Now these are difference values. We need to know the smallest value m in order to construct the original set of values from the differences. Then the full values are m, m + d1, m + d1 + d2, and m + d1 + d2 + d3,.

Related posts

[1] This is the Rice part. Robert Rice simplified Samuel Golomb’s encoding scheme in the special case that M is a power of 2.

Memorizing chemical element symbols

Here’s something I’ve wondered about before: are there good mnemonics for chemical element symbols?

Some element symbols are based on Latin or German names and seem arbitrary to English speakers, such as K (kalium) for potassium or Fe (ferrum) for iron. However, these elements are very common and so their names and symbols are familiar.

When you take out the elements whose symbols are mnemonic in another language, every element symbol begins with the first letter of the element name. The tricky part is the second letter. For example, does Ra stand for radon or radium?

The following rule of thumb usually holds whenever there is a chemical symbol what corresponds to the first letters of two different elements:

 The lightest/longest-known element wins.

Scientists didn’t wait until the periodic table was complete before assigning symbols, and the easiest names were handed out first. Calcium (20) was assigned Ca, for example, before cadmium (48) and californium (98) were known.

The elements were discovered roughly in order of atomic weight. For example, beryllium (4) was discovered before berkelium (97) and neon (10) was discovered before neptunium (93). So sometimes you can substitute knowledge of chemistry for knowledge of history. [1]

There are instances where the heavier element got to claim the first-two-letter symbol. Usually the heavier element was discovered first. That’s why Ra stands for radium (88) and not radon (86). One glaring exception to this rule is that palladium (Pd) was discovered a century before protactinium (Pa).

Often the element that was discovered first is more familiar, and so you could almost say that when there’s a conflict, the more familiar element wins. For example, Li stands for lithium and not livermorium. This revises our rule of thumb above:

The lightest/longest-known/most familiar element wins.

To return to the question at the top of the post, I’m not aware of a satisfying set of mnemonics for chemical element symbols. But there are some heuristics. Generally the elements that are the lightest, most familiar, and have been known the longest get the simpler names. Maybe you can remember, for example, that berkelium must be Bk because B, Be, and Br were already taken by the time berkelium was discovered.

After using this heuristic, you could apply more brute-force mnemonic techniques for whenever the heuristic doesn’t work. (Whenever it doesn’t work for you: mnemonics are very personal.) For example, you might imagine a registered nurse (an RN) spraying the insecticide Raid on a fish, fish being a Major system encoding of the number 86, the atomic number of radon.

Related posts

[1] Chemical elements named after scientists, planets, and laboratories appear toward the end of the table and are recent discoveries.

Largest known compositorial prime

I ran across a blog post here that said a new record has been set for the largest compositorial prime. [1]

OK, so what is a compositorial prime? It is a prime number of the form

n! / n# + 1

where n# denotes n primorial, the product of the prime numbers no greater than n.

The newly discovered prime is

N = 751882!/751882# + 1

It was described in the article cited above as

751882!/751879# + 1,

but the two numbers are the same because there are no primes greater than 751879 and less than 751882, i.e.

751879# = 751882#.

About how large is N? We can calculate the log of the numerator easily enough:

>>> import scipy.special
>>> scipy.special.loggamma(751883)
9421340.780760147

However, the denominator is harder to compute. According to OIES we have

n# = exp((1 + o(1)) n)

which would give us the estimate

log(751882#) ≈ 751882.

So

log10(N) = log(N) / log(10) ≈ (9421340 − 751882) / log(10) ≈ 3765097.

According to this page,

log10(N) = 3765620.3395779

and so our approximation above was good to four figures.

So N has between 3 and 4 million digits, making it much smaller than the largest known prime, which has roughly 41 million digits. Overall, N is the 110th largest known prime.

 

[1] I misread the post at first and thought it said there was a new prime record (skipping over the “compositorial” part) and was surprised because the number is not a Mersenne number. For a long time now the largest known prime has been a Mersenne prime because there is a special algorithm for testing whether Mersenne numbers are prime, one that is much more efficient than testing numbers in general.

 

log2(3) and log2(5)

AlmostSure on X pointed out that

log2 3 ≈ 19/12,

an approximation that’s pretty good relative to the size of the denominator. To get an approximation that’s as accurate or better requires a larger denominator for log2 5.

log2 5 ≈ 65/28

This above observations are correct, but are they indicative of a more general pattern? Is log2 3 easier to approximate than log2 5 using rational numbers? There are theoretical ways to quantify this—irrationality measures—but they’re hard to compute.

If you look at the series of approximations for both numbers, based on continued fraction convergents, the nth convergent for log2 5 is more accurate than the nth convergent for log2 3, at least for the first 16 terms. After that I ran out of floating point precision and wasn’t sufficiently interested to resort to extended precision.

Admittedly this is a non-standard way to evaluate approximation error. Typically you look at the approximation error relative to the size of the denominator, not relative to the index of the convergents.

Here’s a more conventional comparison, plotting the log of approximation error against the log of the denominators.

Continued fraction posts

In-shuffles and out-shuffles

The previous post talked about doing perfect shuffles: divide a deck in half, and alternately let one card from each half fall.

It matters which half lets a card fall first. If the top half’s bottom card falls first, this is called an in-shuffle. If the bottom half’s bottom card falls first, it’s called an out-shuffle.

With an out-shuffle, the top and bottom cards don’t move. Presumably it’s called an out-shuffle because the outside cards remain in place.

An out-shuffle amounts to an in-shuffle of the inner cards, i.e. the rest of the deck not including the top and bottom card.

The previous post had a Python function for doing an in-shuffle. Here we generalize the function to do either an in-shuffle or an out-shuffle. We also get rid of the list comprehension, making the code longer but easier to understand.

def shuffle2(deck, inside = True):
    n = len(deck)
    top = deck[: n//2]
    bottom = deck[n//2 :]
    if inside:
        first, second = bottom, top
    else:
        first, second = top, bottom
    newdeck = []
    for p in zip(first, second):
        newdeck.extend(p)
    return newdeck

Let’s use this code to demonstrate that an out-shuffle amounts to an in-shuffle of the inner cards.

deck = list(range(10))
d1 = shuffle2(deck, False) 
d2 = [deck[0]] + shuffle2(deck[1:9], True) + [deck[9]]
print(d1)
print(d2)

Both print statements produce [0, 5, 1, 6, 2, 7, 3, 8, 4, 9].

I said in the previous post that k perfect in-shuffles will restore the order of a deck of n cards if

2k = 1 (mod n + 1).

It follows that k perfect out-shuffles will restore the order of a deck of n cards if

2k = 1 (mod n − 1)

since an out-shuffle of n cards is essentially an in-shuffle of the n − 2 cards in the middle.

So, for example, it only takes 8 out-shuffles to return a deck of 52 cards to its original order. In the previous post we said it takes 52 in-shuffles, so it takes a lot fewer out-shuffles than in-shuffles.

It’s plausible to conjecture that it takes fewer out-shuffles than in-shuffles to return a deck to its initial order, since the former leaves the two outside cards in place. But that’s not always true. It’s true for a deck of 52 cards, but not for a deck of 14, for example. For a deck of 14 cards, it takes 4 in-shuffles or 12 out-shuffles to restore the deck.

Perfect and imperfect shuffles

Take a deck of cards and cut it in half, placing the top half of the deck in one hand and the bottom half in the other. Now bend the stack of cards in each hand and let cards alternately fall from each hand. This is called a rifle shuffle.

Random shuffles

Persi Diaconis proved that it takes seven shuffles to fully randomize a desk of 52 cards. He studied videos of people shuffling cards in order to construct a realistic model of the shuffling process.

Shuffling randomizes a deck of cards due to imperfections in the process. You may not cut the deck exactly in half, and you don’t exactly interleave the two halves of the deck. Maybe one card falls from your left hand, then two from your right, etc.

Diaconis modeled the process with a probability distribution on how many cards are likely to fall each time. And because his model was realistic, after seven shuffles a deck really is well randomized.

Perfect shuffles

Now suppose we take the imperfection out of shuffling. We do cut the deck of cards exactly in half each time, and we let exactly one card fall from each half each time. And to be specific, let’s say the first card will always fall from the top half of the deck. That is, we do an in-shuffle. (See the next post for a discussion of in-shuffles and out-shuffles.) A perfect shuffle does not randomize a deck because it’s a deterministic permutation.

To illustrate a perfect in-shuffle, suppose you start with a deck of these six cards.

A23456

Then you divide the deck into two halves.

A23 456

Then after the shuffle you have the following.

4A5263

Incidentally, I created the images above using a font that included glyphs for the Unicode characters for playing cards. More on that here. The font produced black-and-white images, so I edited the output in GIMP to turn things red that should be red.

Coming full circle

If you do enough perfect shuffles, the deck returns to its original order. This could be the basis for a magic trick, if the magician has the skill to repeatedly perform a perfect shuffle.

Performing k perfect in-shuffles will restore the order of a deck of n cards if

2k = 1 (mod n + 1).

So, for example, after 52 in-shuffles, a deck of 52 cards returns to its original order. We can see this from a quick calculation at the Python REPL:

>>> 2**52 % 53
1

With slightly more work we can show that less than 52 shuffles won’t do.

>>> for k in range(1, 53):
    ... if 2**k % 53 == 1: print(k)
52

The minimum number of shuffles is not always the same as the size of the deck. For example, it takes 4 shuffles to restore the order of a desk of 14 cards.

>>> 2**4 % 15
1

Shuffle code

Here’s a function to perform a perfect in-shuffle.

def shuffle(deck):
    n = len(deck)
    return [item for pair in zip(deck[n//2 :], deck[:n//2]) for item in pair]

With this you can confirm the results above. For example,

n = 14
k = 4
deck = list(range(n))
for _ in range(k):
    deck = shuffle(deck)
print(deck)

This prints 0, 1, 2, …, 13 as expected.

Related posts

Knight’s tour with fewest obtuse angles

Donald Knuth gives a public lecture each year around Christmas. This year was his 29th Christmas lecture, Adventures with Knight’s Tours.

I reproduced one of the images from his lecture.

Knight's tour with the minimum number of obtuse angles

This is the knight’s tour with the minimum number of obtuse angles, marked with red dots. The solution is unique, up to rotations and reflections.

Knuth said he thought this was one of the most beautiful knight’s tours. He discusses this tour about 44 minutes into the video here.

More knight’s tour posts

The center of the earth is not straight down

If the earth were a perfect sphere, “down” would be the direction to the center of the earth, wherever you stand. But because our planet is a bit flattened at the poles, a line perpendicular to the surface and a line to the center of the earth are not the same. They’re nearly the same because the earth is nearly a sphere, but not exactly, unless you’re at the equator or at one of the poles. Sometimes the difference matters and sometimes it does not.

From a given point on the earth’s surface, draw two lines: one straight down (i.e. perpendicular to the surface) and one straight to the center of the earth. The angle φ that the former makes with the equatorial plane is geographic latitude. The angle θ that the latter makes with the equatorial plane is geocentric latitude.

For illustration we will draw an ellipse that is far more eccentric than a polar cross-section of the earth.

At first it may not be clear why geographic latitude is defined the way it is; geocentric latitude is conceptually simpler. But geographic latitude is easier to measure: a plumb bob will show you which direction is straight down.

There may be some slight variation between the direction of a plumb bob and a perpendicular to the earth’s surface due to variations in surface gravity. However, the deviations due to gravity are a couple orders of magnitude smaller than the differences between geographic and geocentric latitude.

Conversion formulas

The conversion between the two latitudes is as follows.

\begin{align*} \theta &= \text{atan2}((1 - e^2)\sin\varphi, \cos\varphi) \\ \varphi &= \text{atan2}(\sin\theta, (1 - e^2)\cos\theta) \end{align*}

Here e is eccentricity. The equations above work for any ellipsoid, but for earth in particular e² = 0.00669438.

The function atan2(y, x) returns an angle in the same quadrant as the point (x, y) whose tangent is y/x. [1]

As a quick sanity check on the equations, note that when eccentricity e is zero, i.e. in the case of a circle, φ = θ. Also, if φ = 0 then θ = φ for all eccentricity values.

Next we give a proof of the equations above.

Proof

We can parameterize an ellipse with semi-major axis a and semi-minor axis b by

(x(t), y(t)) = (a \cos t, b \sin t)

The slope at a point (x(t), y(t)) is the ratio

\frac{y^\prime(t)}{x^\prime(t)} = \frac{b \cos t}{-a \sin t}

and so the slope of a line perpendicular to the tangent, i.e tan φ, is

\tan \varphi = \frac{a \sin t}{b \cos t} = \frac{a}{b} \tan t

Now

\tan \theta = \frac{b \sin t}{a \cos t} = \frac{b}{a} \tan t

and so

\begin{align*} \tan \varphi &= \frac{a}{b} \tan t \\ &= \frac{a}{b} \left( \frac{a}{b} \tan \theta \right) \\ &= \frac{a^2}{b^2} \tan \theta \\ &= \frac{1}{1 - e^2} \tan \theta \end{align*}

where e² = 1 − b²/a² is the eccentricity of the ellipse. Therefore

(1 - e^2) \tan \varphi = \tan \theta

and the equations at the top of the post follow.

Difference

For the earth’s shape, e² = 0.006694 per WGS84. For small eccentricities, the difference between geographic and geocentric latitude is approximately symmetric around 45°.

But for larger values of eccentricity the asymmetry becomes more pronounced.

Related posts

[1] There are a couple complications with programming language implementations of atan2. Some call the function arctan2 and some reverse the order of the arguments. More on that here.

Interesting categories are big

One of the things I found off-putting about category theory when I was first exposed to it was its reliance on the notion of “collections” that are not sets. That seemed to place the entire theory on dubious foundations with paradoxes looming around every corner.

It turns out you can mostly ignore such issues in application. You can, for example, talk about the forgetful functor that maps a group to the set of its elements, ignoring the group structure, without having to think deeply about the collection of all sets, which Russell’s paradox tells us cannot itself be a set.

And yet issues of cardinality are not entirely avoidable. There is a theorem [1] that says in effect that category theory would be uninteresting without collections too large to be sets.

Every category C with a set of arrows is isomorphic to one in which the objects are sets and the arrows are functions.

If the collection of arrows (morphisms) between objects in C is so small as to be a set, then C is a sub-category of the category of sets. As Awodey explains, “the only special properties such categories can possess are ones that are categorically irrelevant, such as features of the objects that do not affect the arrows in any way.”

Most categories of interest have too many objects to be a set, and even more morphisms than objects.

Related posts

[1] Category Theory by Steve Awodey. Theorem 1.6.