Efficiently computing multiple modular inverses at once

Suppose you have a large prime number M and you need to find the inverse of several numbers mod M.  Montgomery’s trick is a way to combine the computation of the inverses to take less time than computing the inverses individually. Peter Montgomery (1947–2020) came up with this trick in 1985.

We will illustrate Montgomery’s trick by inverting three numbers—a, b, and c—though the trick extends to any number of numbers. It is commonly used in cryptography.

Modular inverses are much slower to calculate than modular products, so doing fewer of the former and more of the latter is a good tradeoff. Montgomery’s method only calculates one modular inverse, regardless of how many numbers need to be inverted.

The idea is to directly invert the product of all the numbers and use multiplication to find the inverses of the individual numbers. In our case, we compute

xab
y = cy = abc
x−1 = cy−1
b−1 = ax−1
a−1 = bx−1

To show that this actually saves time, we’ll run some Python code to invert three random numbers modulo a very large prime, much larger than occurs in practice. The reason is to make the computation time longer and easier to demonstrate. In practice, Montgomery’s trick saves a little time off of a lot of calculations. Here we’ll save a lot of time off a handful of calculations.

import sys
import time
from secrets import randbelow

# extend the default maximum integer size
sys.set_int_max_str_digits(100000)

# the 32nd Mersenne prime
M = 2**756839 - 1

def simple(a, b, c, M):
    return [pow(x, -1, M) for x in [a, b, c]]

def montgomery(a, b, c, M):
    x = a*b % M
    y = x*c % M
    yinv = pow(y, -1, M)
    cinv = x*yinv % M
    xinv = c*yinv % M
    binv = a*xinv % M
    ainv = b*xinv % M
    return [ainv, binv, cinv]
    
a = randbelow(M)
b = randbelow(M)
c = randbelow(M)

start = time.perf_counter()
result = simple(a, b, c, M)
elapsed = time.perf_counter() - start
print(elapsed)

start = time.perf_counter()
result = montgomery(a, b, c, M)
elapsed = time.perf_counter() - start
print(elapsed)

When we ran this, the direct approach took 121.8 seconds, and Montgomery’s trick took 47.6 seconds.

Related posts

RSA as a pairing

The last couple posts have been about group pairings, specifically Tate pairings as they’re used in cryptography. This post will show that RSA encryption can be seen as a special case of pairing-based cryptography.

The idea comes from Ben Lynn’s 2007 dissertation. Lynn is the “L” in BLS signatures—one of the topics in his dissertations—and in BLS elliptic curves.

A pairing is a bilinear mapping from two groups to a third group

eG1 × G2 → GT.

Here bilinear means that if P is an element of G1 and Q is an element of G2, and a and b are nonnegative integers, then

e(aPbQ) = e(PQ)ab.

There are more criteria for a pairing to be useful in cryptography, but we won’t need those for this post.

Ben Lynn’s dissertation mentions that exponentiation is a special case of pairing if you let G1 and GT be the multiplicative group of the integers mod r and let G2 be the additive group of integers mod (r − 1). Then you can define a pairing by

e(ga) = ga.

Typically you can’t just write down a simple expression for a pairing, but in this case you can.

RSA encryption corresponds to rpq where p and q are large primes. The product pq is made public but the factorization into p and q is held secret. A message [1] is encrypted by exponentiation mod n where the exponent is the public key. In Lynn’s notation, the message is g and the public key is a.

The security of RSA encryption depends on the fact that you can’t recover g from ga mod n unless you know a trapdoor, the factorization of n [2]. This is true of pairings more generally: it is not practical to recover the inputs to a pairing from the output unless you know a trapdoor.

Related posts

[1] In practice, RSA isn’t used to encrypt entire messages. Instead, it is used to encrypt a key for a symmetric encryption algorithm such as AES, and that key is used to encrypt the message. This is done for efficiency.

[2] Or, more specifically, a private key that can easily be computed if you know the factorization of n. It’s conceivable that breaking RSA encryption is easier than factoring, but so far that does not appear to be the case.

Three-party Diffie-Hellman in one shot

Elliptic curve Diffie-Hellman

Given a point P on an elliptic curve E, and a random number a, aP means to add P to itself a times, using the addition on E. The point aP can be computed efficiently, even if a is a very large number [1]. However, if E has a large number of points, and if a is chosen at random from a large range, then it is not practical to compute a given P and aP.

This is the elliptic curve version of the discrete logarithm problem, and its presumed difficulty is the basis of the security of Diffie-Hellman key exchange.

Two-party Diffie-Hellman

With two-party Diffie-Hellman key exchange, two parties, Alice and Bob, generate random private keys a and b respectively. They agree on a point P on an elliptic curve E. Alice computes aP and sends it to Bob. Simultaneously Bob computes bP and sends it to Alice. Then Alice can compute

a(bP) = (ab)P

and Bob can compute

b(aP) = (ba)P = (ab)P.

Then both Alice and Bob know a shared secret, the point (ab)P on E, but neither party has revealed a private key.

Three-party Diffie-Hellman

You could extend the approach above to three parties, say adding Carol, but this would require extra communication: Alice could send (ab)P to Carol, which she could multiply by her private key c to obtain abcP. Similarly, everyone else could arrive at abcP. Each person has to do a computation, send and receive a message, do another computation, and send an receive another message.

Joux [2] came up with a way to do Diffie-Hellman key exchange with three people and only one round of sending and receiving messages. The set up uses a pairing e( , ) of two elliptic curve subgroups, G1 and G2, as in the previous post. Fix generators PG1 and QG2. Each party multiplies P and Q by their private key and sends the results to the other two parties.

Alice receives bP from Bob and cQ from Carol. This is enough for her to compute

e(bPcQ)a = e(P, Q)abc.

Similarly, Bob receives aP from Alice and cQ from Carol, enabling him to compute

e(aPcQ)b = e(P, Q)abc.

And finally, Carol receives aP from Alice and bQ from Bob, enabling her to compute

e(aPbQ)c = e(P, Q)abc.

So all three parties can compute the shared secret e(P, Q)abc. but no party knows the other parties’ private keys.

Footnotes

[1] If you want to multiply a point by 2100, for example, you don’t carry out 2100 additions; you carry out 100 doublings. Of course not every positive integer is a power of 2, but every positive integer is the sum of powers of 2, i.e. it can be written in binary. So as you’re doing your doublings, sum the terms that correspond to 1s in the binary representation of the number you’re multiplying by.

[2] Antoine Joux. A One Round Protocol for Tripartite Diffie–Hellman. Journal of Cryptology (2004) 17: 263–276.

Elliptic curve pairings in cryptography

Pairings can mean a variety of related things in group theory, but for our purposes a pairing is a bilinear mapping from two groups to a third group.

e: G1 × G2GT

Typically the group operation on G1 and G2 is written additively and the group operation on GT is written multiplicatively. In fact, GT will always be the multiplicative group of a finite field, i.e. GT consists of the non-zero elements of a finite field under multiplication. (The “T” stands for “target.”)

Here bilinear [1] means that if P is an element of G1 and Q is an element of G2, and a and b are nonnegative integers,

e(aPbQ) = e(P, Q)ab.

There are a few provisos …

Robin Williams imitating William F. Buckley

First, the pairing must be non-degenerate, i.e. e(PQ) ≠ 1 for some P and Q.

Second, the pairing must be efficiently computable.

Third, the embedding degree must not be “too high.” This means that if GT is the multiplicative group of a field with pk elements, k is not too big. We will look at two examples in which k = 12.

The second and third provisos are important even though they’re not stated rigorously.

Cryptography often speaks of pairing elliptic curves, but in fact it uses pairings of prime-order subgroups of the additive groups of elliptic curves. Because the subgroups have prime order, they are cyclic, and so the pairing is determined by its value on a generator from each subgroup.

Example: BN254

The previous post briefly mentioned a pairing between two elliptic curves, BN254 and alt_bn128, that is used in Ethereum and was used in Zcash in the original Sprout shielded protocol.

The elliptic curve BN254 is defined over the field Fp, the integers mod p, where

p = 21888242871839275222246405745257275088696311157297823662689037894645226208583.

and the elliptic curve alt_bn128 is defined over the field Fp[i], i.e. the field Fp, with an imaginary element i adjoined.

Both elliptic curves have a subgroup of order

r = 21888242871839275222246405745257275088548364400416034343698204186575808495617,

which is prime. So in the pairing the groups G1 and G2 are isomorphic to the integers mod r. The target group GT has order p12 − 1 and so the embedding degree k equals 12, and so the embedding degree is “not too high.”

Example: BLS12-381

Another example also comes from Ethereum and Zcash. Ethereum uses BN254 in smart contracts, but it uses BLS12-381 in its consensus layer. Zcash switched from BN254 to BLS12-381 in the Sapling release.

BLS12-381 is defined over a prime order field with on the order of 2381 elements and has embedding order 12, hence 12-381. The BLS stands for Paulo Barreto, Ben Lynn, and Michael Scott. Elliptic curve names often look mysterious, but they’re actually pretty descriptive. I discuss BLS12-381 in more detail here. As in the example above, BLS12-381 is defined over a field Fp and is paired with a curve over Fp[i], i.e. the same field with an imaginary element adjoined. The equation for BLS12-381 is

y² = x³ + 4

and the equation for the curve it is paired with is

y² = x³ + 4(1 + i)

As before the target group is the multiplicative group of a finite field of order p12.

Related posts

[1] You’ll also see bilinearity defined by

e(PQR) = e(PRe(QR)

and

e(PRS) = e(PRe(PS).

These definitions are equivalent. To see that the definition here implies the definition at the top, write out aP as PP + … + P etc.

Since we’re working in subgroups of prime order, there is a generator for each subgroup. Write out each element as a multiple of a generator, then the definition at the top implies the definition here.

Physical Keys and Encryption Keys

A physical key, such as a house key, is a piece of metal with cuts of differing depths. Typically there may be around 6 cuts, with five different possible depths for each cut. This allows 56 = 15,625 possible keys.

Encryption keys, such as AES keys, are a string of bits, often 128 bits, for a total of 2128 possible keys.

How long would a physical key have to be to have the same level of security as an encryption key? We’d need to solve

5n = 2128

which means

n = 128 / log25 = 55.12.

So we’d need a key with around 55 notches.

metal key with 55 notches

This only takes into account combinatorial possibilities, not the difficulty of attacking a physical key or a binary key. There are incomparably more possibilities for binary keys, but encryption attacks can be automated and carried out remotely (unless a computer is air gapped). A physical lock can only be attacked in person. It takes a lock picker orders of magnitude more time to try a key than a password cracking program. On the other hand, locks aren’t picked by trying thousands of keys.

Related post: Measuring cryptographic strength in liters of boiling water

RSA with multiple primes

Typically RSA public keys are the product of two large primes, npq. But there’s no reason they couldn’t be the product of say three primes, npqr, or more primes, as long as φ(n), or λ(n), is calculated correctly.

Encryption is done the same way. Decryption could be done the same way, except there is the opportunity for it to be more efficient. The trick is to use the CRT (Chinese Remainder Theorem) in a way similar to Garner’s algorithm. This is why RSA with multiple primes is sometimes used for digital signatures.

The difficulty of factoring n using the GNFS algorithm doesn’t change depending on the number of factors n has, as long as all the factors are sufficiently large, far too large to find using trial division.

Daniel Bernstein’s post-quantum RSA paper was based on keys that are the product of a large number of 4096-bit primes. This way all the arithmetic is carried out modulo 4096-bit primes, not modulo terabyte primes.

A quiet change to RSA

An RSA public key is a pair of numbers (en) where e is an exponent and npq where p and q are large prime numbers. The original RSA paper said choose a private key d and compute e. In practice now we choose e and compute d. Furthermore, e is now almost always 65537 for reasons given here. So the public key is essentially just n.

Euler’s totient function

The relationship between the exponent and the private decryption key in the original RSA paper was

ed = 1 mod φ(n).

It is easy to compute e given d, or d given e, when you know Euler’s totient function of n,

φ(n) = (p − 1)(q − 1).

The security of RSA encryption rests on the assumption that it is impractical to compute φ(n) unless you know p and q.

Carmichael’s totient function

Gradually over the course of several years, the private key d changed from being the solution to

ed = 1 mod φ(n)

to being the solution to

ed = 1 mod λ(n)

where Euler‘s totient function φ(n) was replaced with Carmichael‘s totient function λ(n).

The heart of the original RSA paper was Euler’s generalization of Fermat’s little theorem which says if a is relatively prime to m, then

aφ(n) = 1 (mod n)

Carmichael’s λ(n) is defined to be the smallest number that can replace φ(n) in the equation above. It follows that λ(n) divides φ(n).

Why the change?

Using Carmichael’s totient rather than Euler’s totient results in smaller private keys and thus faster decryption. When n = pq for odd primes p and q,

λ(n) = lcm( (p − 1), (q − 1) ) = (p − 1)(q − 1) / gcd( (p − 1), (q − 1) )

so λ(n) is smaller than φ(n) by a factor of gcd( (p − 1), (q − 1) ). At a minimum, this factor is at least 2 since p − 1 and q − 1 are even numbers.

However, an experiment suggests this was a trivial savings. When I generated ten RSA public keys the gcd was never more than 8.

from sympy import randprime, gcd

for _ in range(10):
    p = randprime(2**1023, 2**1024)
    q = randprime(2**1023, 2**1024)
    print(gcd(p-1, q-1))

I repeated the experiment with 100 samples. The median of the gcd’s was 2, the mean was 35.44, and the maximum was 2370. So the while gcd might be moderately large, but it is usually just 2 or 4.

Better efficiency

The efficiency gained from using Carmichael’s totient is minimal. More efficiency can be gained by using Garner’s algorithm.

Time needed to factor large integers

The optimal choice of factoring algorithm depends on the size of the number you want to factor. For numbers larger than a googol (10100) the GNFS (general number field sieve) algorithm is the fastest known factoring algorithm, making GNFS the algorithm of choice for trying to factor public keys for RSA encryption

The expected time required to factor a number n using the GNFS is proportional to

exp( f(n) g(n) )

where f(n) is relatively constant and g(n) varies more strongly with n. More specifically

f(n) = (64/9)1/3 + o(1)

and

g(n) = (log n)1/3 (log log n)2/3.

The notation o(1) means a term that goes to zero as n increases. Very often in practice you can ignore o(1) terms when your value of n is large. In cryptographic applications n is certainly large, at least 21024, and yet the o(1) term is still significant for values of n seen in practice. I’ve seen one source say that for keys used in practice the o(1) term is around 0.27.

Security levels

It is widely stated that factoring a 1024-bit private key is comparable in difficulty to breaking an 80-bit symmetric encryption key, i.e. that 1024-bit keys provide 80-bit security.

To find the security level of a 1024-bit RSA key, the size of the o(1) term matters. But given this, if we want to find the security level of more RSA key sizes, it doesn’t matter how big the o(1) term is. It only that the term is roughly constant over the range of key sizes we are interested in.

If f(n) is relatively constant, then the log of the time to factor n is proportional to g(n), and the log of the time to break an encryption with security level k is proportional to k. So the security level of a key n should be proportional to g(n). Let’s see if that’s the case, using the reported security levels of various key sizes.

import numpy as np

# RSA key sizes and security levels
data = {
    1024 : 80,
    2048 : 112,
    3072 : 128,
    4096 : 140,
    7680 : 192,
}
for d in data:
    x = d*np.log(2) # natural log of 2^d
    y = x**(1/3)*(np.log(x)**(2/3)) 
    print(data[d]/y)

This prints the following:

2.5580
2.6584
2.5596
2.4819
2.6237

So the ratio is fairly constant, as expected. All the reported security levels are multiples of 4, which suggests there was some rounding done before reporting them. This would account for some of the variation above.

The output above suggests that the security level of an RSA key with b bits is roughly

2.55 x1/3 log(x)2/3

where x = log(2) b.

Aside on RSA security

RSA encryption is not broken by factoring keys but by exploiting implementation flaws.

Factoring a 2048-bit RSA key would require more energy than the world produces in a year, as explained here.

Post-quantum RSA with gargantuan keys

If and when practical scalable quantum computers become available, RSA encryption would be broken, at least for key sizes currently in use. A quantum computer could use Shor’s algorithm factor n-bit numbers in time on the order of n². The phrase “quantum leap” is misused and overused, but this would legitimately be a quantum leap.

That said, Shor’s method isn’t instantaneous, even on a hypothetical machine that does not yet exist and may never exist. Daniel Bernstein estimates that RSA encryption with terabyte public keys would be secure even in a post-quantum world.

Bernstein said on a recent podcast that he isn’t seriously suggesting using RSA with terabyte keys. Computing the necessary key size is an indirect way of illustrating how impractical post-quantum RSA would be.

Related posts

Silent Payments

Bitcoin transactions appear to be private because names are not attached to accounts. But that is not sufficient to ensure privacy; if it were, much of my work in data privacy would be unnecessary. It’s quite possible to identify people in data that does not contain any direct identifiers.

I hesitate to use the term pseudonymization because people define it multiple ways, but I’d say Bitcoin addresses provide pseudonymization but not necessarily deidentification.

Privacy and public ledgers are in tension. The Bitcoin ledger is superficially private because it does not contain user information, only addresses [1]. However, transaction details are publicly recorded on the blockchain.

To add some privacy to Bitcoin, addresses are typically used only once. Wallet software generates new addresses for each transaction, and so it’s not trivial to track all the money flowing between two people. But it’s not impossible either. Forensic tools make it possible to identify parties behind blockchain transactions via metadata and correlating information on the blockchain with information available off-chain.

Silent Payments

Suppose you want to take donations via Bitcoin. If you put a Bitcoin address on your site, that address has to be permanent, and it’s publicly associated with you. It would be trivial to identify (the addresses of) everyone who sends you a donation.

Silent payments provide a way to work around this problem. Alice could send donations to Bob repeatedly without it being obvious who either party is, and Bob would not have to change his site. To be clear, there would be no way to tell from the addresses that funds are moving between Alice and Bob. The metadata vulnerabilities remain.

Silent payments are not implemented in Bitcoin Core, but there are partial implementations out there. See BIP 352.

The silent payment proposal depends on ECDH (elliptic curve Diffie-Hellman) key exchange, so I’ll do a digression on ECDH before returning to silent payments per se.

ECDH

The first thing to know about elliptic curves, as far as cryptography is concerned, is that there is a way to add two points on an elliptic curve and obtain a third point. This turns the curve into an Abelian group.

You can bootstrap this addition to create a multiplication operation: given a point g on an elliptic curve and an integer nng means add g to itself n times. Multiplication can be implemented efficiently even when n is an enormous number. However, and this is key, multiplication cannot be undone efficiently. Given g and n, you can compute ng quickly, but it’s practically impossible to start with knowledge of ng and g and recover n. To put it another way, multiplication is efficient but division is practically impossible [2].

Suppose Alice and Bob both agree on an elliptic curve and a point g on the curve. This information can be public. ECDH lets Alice and Bob share a secret as follows. Alice generates a large random integer a, her private key, and computes a public key A = ag. Similarly, Bob generates a large random integer b and computes a public key Bbg. They’re shared secret is

aBbA.

Alice can compute aB because she (alone) knows a and B is public information. Similarly Bob can compute bA. The two points on the curve aB and bA are the same because

aBabgbagbA.

Back to silent payments

ECDH lets Alice and Bob share a secret, a point on a (very large) elliptic curve. This is the heart of silent payments.

In its simplest form, Alice can send Bob funds using the address

PB + hash(aBg.

A slightly more complicated form lets Alice send Bob funds multiple times. The kth time she sends money to Bob she uses the address

PB + hash(aB || kg

where || denotes concatenation.

But how do you know k? You have to search the blockchain to determine k, much like the way hierarchical wallets search the blockchain. Is the address corresponding to k = 0 on the blockchain? Then try again with k = 1. Keep doing this until you find the first unused k. Making this process more efficient is one of the things that is being worked on to fully implement silent payments.

Note that Bob needs to make B public, but B is not a Bitcoin address per se; it’s information needed to generate addresses via the process described above. Actual addresses are never reused.

Related posts

[1] Though if you obtain your Bitcoin through an exchange, KYC laws require them to save a lot of private information.

[2] Recovering n from ng is known as the discrete logarithm problem. It would be more logical to call it the discrete division problem, but if you write the group operation on an elliptic curve as multiplication rather than addition, then it’s a discrete logarithm, i.e. trying to solve for an unknown exponent. If and when a large-scale quantum computer exists, the discrete logarithm problem will be practical to solve, but presumably not until then.