Prime clusters and Riecoin

Prime clusters are sets of primes that appear as close together as is generally possible.

There is one pair of consecutive prime numbers, 2 and 3, but there cannot be any more: in any larger pair of consecutive numbers, one of the pair will be even. But there are a lot of twin primes, perhaps infinitely many, and so a prime cluster of size two is a pair of primes whose difference is 2.

How close together can a set of three primes be? The set {2, 3, 5} has diameter 3, i.e. the difference between the largest and smallest element is 3. And the set {3, 5, 7} has diameter 4. But in general the diameter of a set of three primes must be at least 6. If the smallest element is bigger than 3, then all the elements are odd, but they cannot be consecutive odd numbers or else one of them would be divisible by 3. But there are many prime clusters of diameter 6. For example, {13, 17, 19} and {37, 41, 43}.

Formal definition and motivation

There is some fuzziness in the discussion above regarding what is generally possible. This section will make our definitions more rigorous.

In general a pair of primes cannot be consecutive numbers because one of the pair must be even. Stated more abstractly, every pair of integers larger than {2, 3} will contain a complete residue class mod 2, i.e. one of the numbers will be congruent to 0 mod 2 and one of the numbers will be congruent to 1 mod 2.

Now let’s look at sets of three primes. The example {2, 3, 5} is exceptional because it contains a complete residue class mod 2, i.e. it contains even and odd numbers. The example {3, 5, 7} is exceptional because it contains a complete residue class mod 3.

We say that a set of k primes is a cluster if it does not contain a full residue class modulo any prime. We could say it does not contain a full residue class modulo any prime qk because trivially no set of k elements can have set of more than k elements. For example, the cluster {13, 17, 19} does not contain a full set of residues mod 2 or 3, but it also does not have a full set of residues mod 5, 7, 11, ….

We say a cluster is maximally dense if it has the minimum diameter for a cluster of its number of primes.

Informally, we will call a maximally dense cluster simply a cluster. A maximally dense prime cluster is also sometimes called a prime constellation.

Riecoin cryptocurrency

I’ve written several posts about the Primecoin cryptocurrency whose proof of work task is finding prime chains. The Riecoin cryptocurrency requires miners to find prime clusters.

Primecoin is far from a major cryptocurrency, with a market cap of around $2.3M. Riecoin is about four times smaller than Primecoin. There is another number-theoretic cryptocurrency, Gapcoin, whose market cap is about 10x smaller than that of Riecoin. Safe to say these three projects are of more interest to mathematicians than investment firms. All three of these prime-based cryptocurrencies were launched between 2013 and 2014.

A proof of work task should satisfy three criteria:

  1. Solutions should be difficult to find but easy to confirm.
  2. The time to find a solution should be roughly predictable.
  3. The difficulty should be adjustable.

Finding prime clusters requires a brute force search, but it’s easy to test whether a cluster has been found, and so the first property is satisfied.

The Hardy-Littlewood conjectures give an estimate of the difficulty in finding prime clusters of a given length. The difficulty can be adjusted by adjusting the required length. So the Hardy-Littlewood conjectures assist in satisfying the second and third properties. It does not matter if the conjectures are false somewhere out in asymptopia; they empirically give good estimates for the size of numbers used in mining Riecoin.

More about clusters

The definition of a maximally dense cluster depends on a function s(k) that says what the diameter of a cluster of k primes would need to be. We’ve said s(2) = 2 and s(3) = 6. More values of s are listed on the page for OEIS sequence A008407.

Related posts

Efficiently testing multiple primes at once

The previous post looked at a technique for inverting multiple integers mod m at the same time, using fewer compute cycles than inverting each integer individually. This post will do something analogous for prime chains, revisiting a post from a few days ago about testing prime chains.

A prime chain is a sequence of primes in which each is twice its predecessor, plus or minus 1. In a Cunningham chain of the first kind, it’s always plus, and in a Cunningham chain of the second kind, it’s always minus.

Primecoin is a cryptocurrency that uses finding prime chains as its proof-of-work (PoW) task. The miner has a choice of finding one of three kinds of prime chain: a Cunningham chain of the first or second kind, or a bi-twin chain. The length of the necessary chain varies over time to keep the difficulty relatively constant. Other PoW blockchains do something similar.

Some people say that Primecoin has miners search for primes for PoW. That’s not quite right. Miners have to find a chain of medium-sized primes rather than finding one big prime. This leads to more predictable compute times.

There is a way to test a candidate Cunningham chain of the second kind all at once. Henri Lifchitz gives his algorithm here. Given a sequence of numbers

n1, n2, n3, …, nk

where ni = 2ni−1 − 1 for each i and  n0 = 1 mod 4, all the numbers in the sequence are probably prime if

2^{n_{k-1} - 1} = 1 \bmod n_0 n_1 n_2 \cdots n_k

For example, consider the chain

31029721, 62059441, 124118881

Note that 31029721 mod 4 = 1 and 31029721 = 2*15514861 − 1. The following code demonstrates that the numbers in the chain are probable primes because it prints 1.

n0 = 15514861
n1 = 2*n0 - 1
n2 = 2*n1 - 1
n3 = 2*n2 - 1
prod = n0*n1*n2*n3
print( pow(2, n2 - 1, prod) )

Next I wanted to try the algorithm on much larger numbers where its efficiency would be more apparent, as in the previous post. But when I did, the test returned a result other than 1 on a known Cunningham chain of the second kind. For example, when I change the first two lines of code above to

n1 = 49325406476*primorial(9811, False) + 1
n0 = (n1 + 1) // 2

the code returns a large result. I verified that each of the numbers in the chain are prime using Sympy’s isprime function.

Usually a probable prime test can have false positives but never a false negative. I haven’t looked at Lifschitz method closely enough to tell whether it can have false negatives, but the code above suggests it can.

Efficiently computing multiple modular inverses at once

Suppose you have a large prime number M and you need to find the inverse of several numbers mod M.  Montgomery’s trick is a way to combine the computation of the inverses to take less time than computing the inverses individually. Peter Montgomery (1947–2020) came up with this trick in 1985.

We will illustrate Montgomery’s trick by inverting three numbers—a, b, and c—though the trick extends to any number of numbers. It is commonly used in cryptography.

Modular inverses are much slower to calculate than modular products, so doing fewer of the former and more of the latter is a good tradeoff. Montgomery’s method only calculates one modular inverse, regardless of how many numbers need to be inverted.

The idea is to directly invert the product of all the numbers and use multiplication to find the inverses of the individual numbers. In our case, we compute

xab
y = cy = abc
x−1 = cy−1
b−1 = ax−1
a−1 = bx−1

To show that this actually saves time, we’ll run some Python code to invert three random numbers modulo a very large prime, much larger than occurs in practice. The reason is to make the computation time longer and easier to demonstrate. In practice, Montgomery’s trick saves a little time off of a lot of calculations. Here we’ll save a lot of time off a handful of calculations.

import sys
import time
from secrets import randbelow

# extend the default maximum integer size
sys.set_int_max_str_digits(100000)

# the 32nd Mersenne prime
M = 2**756839 - 1

def simple(a, b, c, M):
    return [pow(x, -1, M) for x in [a, b, c]]

def montgomery(a, b, c, M):
    x = a*b % M
    y = x*c % M
    yinv = pow(y, -1, M)
    cinv = x*yinv % M
    xinv = c*yinv % M
    binv = a*xinv % M
    ainv = b*xinv % M
    return [ainv, binv, cinv]
    
a = randbelow(M)
b = randbelow(M)
c = randbelow(M)

start = time.perf_counter()
result = simple(a, b, c, M)
elapsed = time.perf_counter() - start
print(elapsed)

start = time.perf_counter()
result = montgomery(a, b, c, M)
elapsed = time.perf_counter() - start
print(elapsed)

When we ran this, the direct approach took 121.8 seconds, and Montgomery’s trick took 47.6 seconds.

Related posts

Compressing a set of hash values

Suppose you have a set of k hash values, each n bits long. Can you compress the set into less than kn bits?

It’s not possible to compress a list of hashes into less than kn bits, but you can hash a set into fewer bits.

Suppose you have a set of 230, roughly a billion, 64-bit hashes. Sort the set and look at the size of gaps between elements. You might expect that consecutive items on the list are roughly 234 apart, and so you could reconstruct the list by reporting the first item and the gaps between the rest, which are 34-bit numbers, not 64-bit numbers, a savings of 30 bits per hash.

This doesn’t exactly work, but it’s the kernel of a good idea. We don’t know that the distance between hashes can be represented by a 34 bit number. The gap could be more or less than 234, but we don’t expect it to often be much more than 234. So we use a variable-length encoding such that when the distance between values is on the order of 234, or less, we save bits, but we allow for the distance to be any size.

Applications

What we’ve described is the essence of Golomb-Rice coding. Its implementation in the Bitcoin protocol is referred to as Golomb-Coded Sets (GCS), described in BIP 158. Golomb-Rice coding is also used other applications where the  values to be compressed are not hashes, such as in lossless auto compression.

Encoding

Let’s go into some detail as to how the distances between sorted values are represented. Suppose you expect the differences to be on the order of M where M is a power of 2. For each difference d, let q and r be the quotient and remainder by M, i.e.

dqMr.

Encode q as a unary number, i.e. string of q 1s, and encode r as an ordinary binary number. Then Golomb-Rice coding of d is the concatenation of the representations of q and r. with a 0 in the middle as a separator. Using || to denote string concatenation we have

unary(q)  ||  0  ||  binary(r).

In general, unary encoding is extremely inefficient, but we’re betting that q will typically be quite small.

The reason we require M to be a power of 2 is so the representation of r will have a fixed length [1].

Let’s work out an example. Suppose M = 220 and

d = 222 + 123456 = 4 × M + 123456.

Then we write q as 1111 and r as 0011110001001000000 and encode d as the string

111100011110001001000000

Decoding

You can concatenate the encodings of consecutive d values and still be able to unambiguously decode the result. Because the r representations have a constant length, you know when an r ends and the next q begins.

For example, suppose we have the following string of bits.

1110010011001011001011111001000000110010001110011101111000101111011

We decode the string from left to right. We count how many ones we see, skip over a 0, then regarding the next 20 bits as a binary number.

111 0 01001100101100101111 1 0 01000000110010001110 0 11101111000101111011

We see three 1s before the first 0 and so we conclude q1 = 3. We skip over the 0 and read the value of r.

r1 = 01001100101100101111two = 314159.

Next we see a single 1 before the next 0, so q2 = 1. We read the next value of r as

r2 =  01000000110010001110two = 265358.

Next we see a 0, i.e. there are no 1s, and so the final q3 = 0. And we have

r3 =  11101111000101111011two = 979323.

So our reconstructed values of d are

d1 = q1 M+ r1 = 3 × 220 + 314159 = 3459887

d2 = q2 M+ r2 = 1 × 220 + 265358 = 1313934

d3 = q3 M+ r3 = 0 × 220 + 979323 = 979323.

Now these are difference values. We need to know the smallest value m in order to construct the original set of values from the differences. Then the full values are m, m + d1, m + d1 + d2, and m + d1 + d2 + d3,.

Related posts

[1] This is the Rice part. Robert Rice simplified Samuel Golomb’s encoding scheme in the special case that M is a power of 2.

Automation and Validation

It’s been said whatever you can validate, you can automate. An AI that produces correct work 90% of the time could be very valuable, provided you have a way to identify the 10% of the cases where it is wrong. Often verifying a solution takes far less computation than finding a solution. Examples here.

Validating AI output can be tricky since the results are plausible by construction, though not always correct.

Consistency checks

One way to validate output is to apply consistency checks. Such checks are necessary, but not sufficient, and often easy to implement. An simple consistency check might be that inputs to a transaction equal outputs. A more sophisticated consistency check might be conservation of energy or something analogous to it.

Certificates

Some problems have certificates, ways of verifying that a calculation is correct that can be evaluated with far less effort than finding the solution that they verify. I’ve written about certificates in the context of optimization, solving equations, and finding prime numbers.

Formal methods

Correctness is more important in some contexts than others. If a recommendation engine makes a bad recommendation once in a while, the cost is a lower probability of conversion in a few instances. If an aircraft collision avoidance system makes an occasional error, the consequences could be catastrophic.

When the cost of errors is extremely high, formal verification may be worthwhile. Formal correctness proofs using something like Lean or Rocq are extremely tedious and expensive to create, and hence not economical. But if an AI can generate a result and a formal proof of correctness, hurrah!

Who watches the watchmen?

But if an AI result can be wrong, why couldn’t a formal proof generated to defend the result also be wrong? As the Roman poet Juvenal asked, Quis custodiet ipsos custodes? Who will watch the watchmen?

An AI could indeed generate an incorrect proof, but if it does, the proof assistant will reject it. So the answer to who will watch Claude, Gemini, and ChatGPT is Lean, Rocq, and Isabelle.

Who watches the watchers of the watchmen?

Isn’t it possible that a theorem prover like Rocq could have a bug? Of course it’s possible; there is no absolute certainty under the sun. But hundreds of PhD-years of work have gone into Rocq (formerly Coq) and so bugs in the kernel of that system are very unlikely. The rest of the system is bootstrapped, verified by the kernel.

Even so, an error in the theorem prover does not mean an error in the original result. For an incorrect result to slip through, the AI-generated proof would have to be wrong in a way that happens to exploit an unknown error in the theorem prover. It is far more likely that you’re trying to prove the wrong thing than that the theorem prover let you down.

I mentioned collision avoidance software above. I looked into collision avoidance software when I did some work for Amazon’s drone program. The software that was formally verified was also unrealistic in its assumptions. The software was guaranteed to work correctly, if two objects are flying at precisely constant velocity at precisely the same altitude etc. If everything were operating according to geometrically perfect assumptions, there would be no need for collision avoidance software.

Regular expressions that cross lines

One of the fiddly parts of regular expressions is how to handle line breaks. Should regular expression searches be applied one line at a time, or should an entire file be treated as a single line?

This morning I was trying to track down a LaTeX file that said “discussed in the Section” rather than simply “discussed in Section.” I wanted to search on “the Section” to see whether I had a similar error in other files.

Line breaks don’t matter to LaTeX [1], so “the” could be at the end of one line and “Section” at the beginning of another. I found what I was after by using

    grep -Pzo "the\s+Section" foo.tex

Here -P tells grep to use Perl regular expressions. That’s not necessary here, but I imprinted on Perl regular expressions long ago, and I use PCRE (Perl compatible regular expressions) whenever possible so I don’t have to remember the annoying little syntax differences between various regex implementations.

The -z option says to treat the entire file as one long string. This eliminates the line break issue.

The -o option says to output only what the regular expression matches. Otherwise grep will return the matching line. Ordinarily that wouldn’t be so bad, but because of the -z option, the matching line is the entire file.

The \s+ characters between the and Section represent one or more whitespace characters, such as a space or a newline.

The -P flag is a Gnu feature, so it works on Linux. But macOS ships with BSD-derived versions of its utilities, and its version grep does not support the -P option. On my Macbook I have ggrep mapped to the Gnu version of grep.

Another option is to use ripgrep rather than grep. It uses Perl-like regular expressions, and so there is no need for anything like the -P flag. The analog of -z in ripgrep is -U, so the counterpart of the command above would be

    ripgrep -Uo "the\s+Section" foo.tex

Usually regular expression searches are so fast that execution time doesn’t matter. But when it does matter, ripgrep can be an order of magnitude faster than grep.

Related posts

[1] LaTeX decides how to break lines in the output independent of line breaks in the input. This allows you to arrange the source file logically rather than aesthetically.

Obscuring P2P nodes with Dandelion

The weakest link in the privacy of cryptocurrency transactions is often outside the blockchain. There are technologies such as stealth addresses and subaddresses to try to thwart attempts to link transactions to individuals. They do a good job of anonymizing transaction data, but the weak link may be metadata, as is often the case.

Cryptocurrency nodes circulate transaction data using a peer-to-peer network. An entity running multiple nodes can compare when data arrived at each of its nodes and triangulate to infer which node first sent a set of transactions. The Dandelion protocol, and its refinement Dandelion++, aims to mitigate this risk. Dandelion++ is currently used in Monero and a few other coins; other cryptocurrencies have considered or are considering using it.

Dandelion plant

The idea behind the Dandelion protocol is to have a “stalk” period and a “diffusion” period. Imagine data working up the stalk of a dandelion plant before diffusing like seeds in the wind. The usual P2P process is analogous to simply blowing on the seed head [1].

During the stalk period, information travels from one node to one node. Then after some number of hops, the diffusion process begins; the final node in the stalk period diffuses the information to all its peers. An observer with substantial but not complete visibility of the network may be able to determine which node initiated the diffusion, but maybe not the node at the other end of the stem.

A natural question is how this differs from something like Tor. In a nutshell, Tor offers identity protection before you enter a P2P network, and Dandelion offers identity protection inside the P2P network.

For more details, see the original paper on Dandelion [2].

Related posts

[1] The original paper on Dandelion uses a dandelion seed as the metaphor for the protocol. “The name ‘dandelion spreading’ reflects the spreading pattern’s resemblance to a dandelion seed head and refers to the diagram below. However, other sources refer to the stalk and head of the dandelion plant, not just a single seed. Both mental images work since the plant has a slightly fractal structure with a single seed looking something like the plant.

Illustration from Dandelion protocol paper

[2] Shaileshh Bojja Venkatakrishnan, Giulia Fanti, Pramod Viswanath. Dandelion: Redesigning the Bitcoin Network for Anonymity. Proceedings of the ACM on Measurement and Analysis of Computing Systems, Volume 1, Issue 1 Article No.: 22, Pages 1–34. Available here: https://doi.org/10.1145/3084459.

Monero subaddresses

Monero has a way of generating new addresses analogous to the way HD wallets generate new addresses for Bitcoin. In both cases, the recipient’s software can generate new addresses to receive payments that others cannot link back to the recipient.

Monero users have two public/private keys pairs: one for viewing and one for spending. Let Ks and ks be the public and private spending keys, and let Kv and kv be the public and private viewing keys. Then the user’s ith subaddress is given by

\begin{align*} K^s_i &= K^s + H(k^v, i) G \\ K^v_i &= k^v K^s_i \end{align*}

Here G is a generator for the elliptic curve Ed25519 and H is a hash function. The hash function output and kv are integers; the public keys, denoted by capital Ks with subscripts and superscripts, are points on Ed25519. The corresponding private keys are

\begin{align*} k^s_i &= k^s + H(k^v, i) \\ k^v_i &= k^v + k^s_i \end{align*}

As with hierarchical wallets, the user scans the blockchain to see which of his addresses have received funds.

A user may choose to give a different subaddress for each transaction for added security, or to group transactions for accounting purposes.

Note that in addition to subaddresses, Monero uses stealth addresses. An important difference between subaddresses and stealth addresses is that recipients generate subaddresses, and senders generate stealth addresses. Someone could send you money to the same subaddress twice, failing to create a new stealth address. This is not possible if you give the sender a different subaddress each time.

Janus attacks

The possibility of a Janus attack is a fly in the subaddress ointment. If an attacker suspects that two subaddresses belong to the same wallet, they can confirm this suspicion by sending a transaction to one subaddress and making it look like it came from the other. If the recipient confirms receipt of the funds, they inform the attacker that his suspicion was correct. A cautious user might not confirm receipt of funds, but a cryptocurrency exchange would. You can read more about the details of a Janus attack here.

Related posts

How stealth addresses work in Monero

Suppose Alice runs a confidential restaurant. Alice doesn’t want there to be any record of who visited her restaurant but does want to get paid for her food. She accepts Monero, and instead of a cash register there are two QR codes on display, one corresponding to her public view key A and the other corresponding to her public spend key S.

How Bob buys his burger

A customer Bob walks into the restaurant and orders a burger and fries. When Bob pays Alice, here’s what’s going on under the hood.

Bob is using software that generates a random integer r and multiplies it by a point G on an elliptic curve, specifically ed25519, obtaining the point

R = rG

on the curve. The software also multiplies Alice’s view key A, a point on the same elliptic curve, by r, then runs a hash function H on the produce rA that returns an integer k.

kH(rA).

Finally, Bob’s software computes the point

PkGS

and sends Alice’s cash register, i.e. her crypto wallet, the pair of points (PR). The point P is a stealth address, an address that will only be used this one time and cannot be linked to Alice or Bob [1]. The point R is additional information that helps Alice receive her money.

How Alice gets paid

Alice and Bob share a secret: both know k. How’s that?

Alice’s public view key A is the product of her private view key a and the group generator G [2]. So when Bob computes rA, he’s computing r(aG). Alice’s software can multiply the point R by a to obtain a(rG).

rAr(aG) = a(rG) = aR.

Both Alice and Bob can hash this point—which Alice thinks of as aR and Bob thinks of as rA—to obtain k. This is ECDH: elliptic curve Diffie-Hellman key exchange.

Next, Alice’s software scans the blockchain for payments to

PkGS.

Note that P is on the blockchain, but only Alice and Bob know how to factor P into kGS because only Alice and Bob know k. And only Alice can spend the money because only she knows the private key s corresponding to the public spend key S where

SsG.

She knows

PkGsG = (ks)G

and so she has the private key (ks) corresponding to P.

Related posts

[1] Bob sends money to the address P, so there could be some connection between Bob and P on the Monero blockchain. However, due to another feature of Monero, namely ring signatures, someone analyzing the blockchain could only determine that Bob is one of 16 people who may have sent money to the address P, and there’s no way to know who received the money. That is, there is no way, using only information on the blockchain, who received the money. A private investigator who saw Bob walk into Alice’s restaurant would have additional information outside the blockchain.

[2] The key assumption of elliptic curve cryptography is that it’s computationally infeasible to “divide” on an elliptic curve, i.e. to recover a from knowledge of G and aG. You could recover a by brute force if the group were small, but the elliptic curve ed25519 has on the order of 2255 points, and a is some integer chosen randomly between 1 and the size of the curve.

Physical Keys and Encryption Keys

A physical key, such as a house key, is a piece of metal with cuts of differing depths. Typically there may be around 6 cuts, with five different possible depths for each cut. This allows 56 = 15,625 possible keys.

Encryption keys, such as AES keys, are a string of bits, often 128 bits, for a total of 2128 possible keys.

How long would a physical key have to be to have the same level of security as an encryption key? We’d need to solve

5n = 2128

which means

n = 128 / log25 = 55.12.

So we’d need a key with around 55 notches.

metal key with 55 notches

This only takes into account combinatorial possibilities, not the difficulty of attacking a physical key or a binary key. There are incomparably more possibilities for binary keys, but encryption attacks can be automated and carried out remotely (unless a computer is air gapped). A physical lock can only be attacked in person. It takes a lock picker orders of magnitude more time to try a key than a password cracking program. On the other hand, locks aren’t picked by trying thousands of keys.

Related post: Measuring cryptographic strength in liters of boiling water