An attack on RSA with exponent 3

As I noted in this post, RSA encryption is often carried out reusing exponents. Sometimes the exponent is exponent 3, which is subject to an attack we’ll describe below [1]. (The most common exponent is 65537.)

Suppose the same message m is sent to three recipients and all three use exponent e = 3. Each recipient has a different modulus Ni, and each will receive a different encrypted message

ci = m³ mod Ni.

Someone with access to c1, c2, and c3 can recover the message m as follows. We can assume each modulus Ni is relatively prime to the others, otherwise we can recover the private keys using the method described here. Since the moduli are relatively prime, we can solve the three equations for m³ using the Chinese Remainder Theorem. There is a unique x < N1 N2 N3 such that

x = c1 mod N1
x = c2 mod N2
x = c3 mod N3

and m is simply the cube root of x. What makes this possible is knowing m is a positive integer less than each of the Ns, and that x < N1 N2 N3. It follows that we can simply take the cube root in the integers and not the cube root in modular arithmetic.

This is an attack on “textbook” RSA because the weakness in this post could be avoided by real-world precautions such as adding random padding to each message so that no two recipients are sent the exact same message.

By the way, a similar trick works even if you only have access to one encrypted message. Suppose you’re using a 2048-bit modulus N and exchanging a 256-bit key. If your message m is simply the key without padding, then m³ is less than N, and so you can simply take the cube root of the encrypted message in the integers.

Python example

Here we’ll work out a specific example using realistic RSA moduli.

    from secrets import randbits, randbelow
    from sympy import nextprime
    from sympy.ntheory.modular import crt
    
    def modulus():
        p = nextprime(randbits(2048))
        q = nextprime(randbits(2048))
        return p*q
    
    N = [modulus() for _ in range(3)]
    m = randbelow(min(N))
    c = [pow(m, 3, N[i]) for i in range(3)]
    x = crt(N, c)[0]
    
    assert(cbrt(x) == m) # integer cube root

Note that crt is the Chinese Remainder Theorem. It returns a pair of numbers, the first being the solution we’re after, hence the [0] after the call.

The script takes a few seconds to run. Nearly all the time goes to finding the 2048-bit (617-digit) primes that go into the moduli. Encrypting and decrypting m takes less than a second.

Related posts

[1] I don’t know who first discovered this line of attack, but you can find it written up here. At least in the first edition; the link is to the 2nd edition which I don’t have.

Public key encryption based on squares and non squares

The RSA encryption algorithm depends indirectly on the assumption that factoring the product of large primes is hard. The algorithm presented here, invented by Shafi Goldwasser and Silvio Micali, depends on the same assumption but in a different way. The Goldwasser-Micali algorithm is more direct than RSA, thought it is also less efficient.

One thing that makes GM interesting is that allows a form of computing on encrypted data that we’ll describe below.

GM in a nutshell

To create a public key, find two large primes p and q and publish N = pq. (There’s one more piece we’ll get to shortly.) You keep p and q private, but publish N, much like with RSA.

Someone can send you a message, one bit at a time, by sending you numbers that either do or do not have a square root mod N.

Sending a 0

If someone wants to send you a 0, they send you a number that has a square root mod N. This is easy to do: they select a number between 1 and N at random, square it mod N, and send you the result.

Determining whether a random number is a square mod N is easy if and only if you know how to factor N. [1]

When you receive the number, you can quickly tell that it is a square because you know how to factor N. The sender knows that it’s a square because he got it by squaring something. You can produce a square without knowing how to factor N, but it’s computationally infeasible to start with a given number and tell whether it’s a square mod N, unless you know the factorization of N.

Sending a 1

Sending a 1 bit is a little more involved. How can someone who cannot factor N produce a number that’s not a square? That’s actually not feasible without some extra information. The public key is not just N. It’s also a number z that is not a square mod N. So the full public key is two numbers, N and z.

To generate a non-square, you first generate a square then multiply it by z.

Example

Suppose you choose p = 314159 and q = 2718281. (Yes, p is a prime. See the post on pi primes. And q comes from the first few digits of e.) In practice you’d choose p and q to be very large, hundreds of digits, and you wouldn’t pick them to have a cute pattern like we did here. You publish N = pq = 853972440679 and imagine it’s too large for anyone to factor (which may be true for someone armed with only pencil and paper).

Next you need to find a number z that is not a square mod N. You do that by trying numbers at random until you find one that is not a square mod p and not a square mod q. You can do that by using Legendre symbols, It turns out z = 400005 will work.

So you tell the world your public key is (853972440679, 400005).

Someone wanting to send you a 0 bit chooses a number between 1 and N = 853972440679, say 731976377724. Then they square it and take the remainder by N to get 592552305778, and so they send you 592552305778. You can tell, using Legendre symbols, that this is a square mod p and mod q, so it’s a square mod N.

If they had wanted to send you a 1, they could have sent us 592552305778 * 400005 mod N = 41827250972, which you could tell isn’t a square mod N.

Homomorphic encryption

Homomorphic encryption lets you compute things on encrypted data without having to first decrypt it. The GM encryption algorithm is homomorphic in the sense that you can compute an encrypted form of the XOR of two bits from an encrypted form of each bit. Specifically, if c1 and c2 are encrypted forms of bits b1 and b2, then c1 c2 is an encrypted form of b1b2. Let’s see why this is, and where there’s a small wrinkle.

Suppose our two bits are both 0s. Then c1 and c2 are squares mod N, and c1 c2 is a square mod N.

Now suppose one bit is a 0 and the other is a 1. Then either c1 is a square mod N and c2 isn’t or vice versa, but in either case their product is not a square mod N.

Finally suppose both our bits are 1s. Since 1⊕1 = 0, we’d like to say that c1 c2 is a square mod N. Is it?

The product of two non-squares is not necessarily a non-square. For example, 2 and 3 are not squares mod 35, and neither is their product 6 [2]. But if we followed the recipe above, and calculated c1 and c2 both by multiplying a square by the z in the public key, then we’re OK. That is, if c1 = x²z and c2 = y²z, then c1c2 = x²y²z², which is a square. So if you return non-squares that you find as expected, you get the homomorphic property. If you somehow find your own non-squares, they might not work.

Related posts

[1] As far as we know. There may be an efficient way to tell whether x is a square mod N without factoring N, but no such method has been published. The problem of actually finding modular square roots is equivalent to factoring, but simply telling whether modular square roots exist, without having to produce the roots, may be easier.

If quantum computing becomes practical, then factoring will be efficient and so telling whether numbers are squares modulo a composite number will be efficient.

[2] You could find all the squares mod 35 by hand, or you could let Python do it for you:

>>> set([x*x % 35 for x in range(35)])
{0, 1, 4, 9, 11, 14, 15, 16, 21, 25, 29, 30}

An infinite product challenge

Gil Kalai wrote a blog post yesterday entitled “Test Your Intuition (or knowledge, or programming skills) 36.” The challenge is to evaluate the infinite product

\prod_{p\,\, \mathrm{prime}} \frac{p^2+1}{p^2 - 1}

I imagine there’s an elegant analytical solution, but since the title suggested that programming might suffice, I decided to try a little Python. I used primerange from SymPy to generate the list of primes up to 200, and cumprod from NumPy to generate the list of partial products.

    cumprod(
        [(p*p+1)/(p*p-1) for p in primerange(1,200)]
    )

Apparently the product converges to 5/2, and a plot suggests that it converges very quickly.

Plot of partial products

Here’s another plot to look more closely at the rate of convergence. Here we look at the difference between 5/2 and the partial products, on a log scale, for primes less than 2000.

Plot of 2.5 minus partial products, log scale

Base85 encoding

I wrote a while back about Base32 and Base64 encoding, and yesterday I wrote about Bitcoin’s Base58 encoding. For completeness I wanted to mention Base85 encoding, also known as Ascii85. Adobe uses it in PostScript and PDF files, and git uses it for encoding patches.

Like Base64, the goal of Base85 encoding is to encode binary data printable ASCII characters. But it uses a larger set of characters, and so it can be a little more efficient. Specifically, it can encode 4 bytes (32 bits) in 5 characters.

Why 85?

There are 95 printable ASCII characters, and

log95(232) = 4.87

and so it would take 5 characters encode 4 bytes if you use all possible printable ASCII characters. Given that you have to use 5 characters, what’s the smallest base that will still work? It’s 85 because

log85(232) = 4.993

and

log84(232) = 5.006.

(If you’re not comfortable with logarithms, see an alternate explanation in the footnote [1].)

Now Base85 is different from the other bases I’ve written about because it only works on 4 bytes at a time. That is, if you have a number larger than 4 bytes, you break it into words of 4 bytes and convert each word to Base 85.

Character set

The 95 printable ASCII characters are 32 through 126. Base 85 uses characters 33 (“!”) through 117 (‘u’). ASCII character 32 is a space, so it makes sense you’d want to avoid that one. Since Base85 uses a consecutive range of characters, you can first convert a number to a pure mathematical radix 85 form, then add 33 to each number to find its Base85 character.

Example

Suppose we start with the word 0x89255d9, equal to 143807961 in decimal.

143807961 = 2×854 + 64×853 + 14×852 + 18×85 + 31

and so the radix 85 representation is (2, 64, 14, 18, 31). Adding 33 to each we find that the ASCII values of the characters in the Base85 representation are (35, 97, 47, 51, 64), or (‘#’, ‘a’, ‘/’, ‘3’, ‘@’) and so #a/3@ is the Base85 encoding of 0x89255d9.

Z85

The Z85 encoding method is also based on a radix 85 representation, but it chose to use a different subset of the 95 printable characters. Compared to Base85, Z85 adds seven characters

    v w x y z { }

and removes seven characters

    ` \ " ' _ , ;

to make the encoding work more easily with programming languages. For example, you can quote Z85 strings with single or double quotes because neither kind of quote is a valid Z85 character. And you don’t have to worry about escape sequences since the backslash character is not part of a Z85 representation.

Gotchas

There are a couple things that could trip someone up with Base85. First of all, Base 85 only works on 32-bit words, as noted above. For larger numbers it’s not a base conversion in the usual mathematical sense.

Second, the letter z can be used to denote a word consisting of all zeros. Since such words come up disproportionately often, this is a handy shortcut, though it means you can’t just divide characters into groups of 5 when converting back to binary.

Related posts

[1] 954 = 81450625 < 232 = 4294967296, so four characters from an alphabet of 95 elements is not enough to represent 232 possibilities. So we need at least five characters.

855 = 4437053125 > 232, so five characters is enough, and in fact it’s enough for them to come from an alphabet of size 85. But 845 = 4182119424 < 232, so an alphabet of 84 characters isn’t enough to represent 32 bits with five characters.

Base 58 encoding and Bitcoin addresses

A few weeks ago I wrote about base32 and base64 encoding. I’ll review these quickly then discuss base58 and its use in Bitcoin.

Base32 and Base64

All three methods have the goal of compactly representing large numbers while maintaining readability. Douglas Crockford’s base32 encoding is the most conservative: it’s case-insensitive and it does not use the letters I, L, O, or U. The first three letters are omitted because of visual similarity to digits, and the last to avoid “accidental obscenities.”

Base 64 is not concerned with avoiding visual similarities, and uses the full upper and lower case alphabet, plus two more symbols, + and /.

Base58

Base58 is nearly as efficient as base64, but more concerned about confusing letters and numbers. The number 1, the lower case letter l, and the upper case letter I all look similar, so base58 retains the digit 1 and does not use the lower case letter l or the capital letter I.

The number 0 looks like the lower case letter o and the upper case letter O. Here base58 makes an unusual choice: it keeps the lower case letter o, but does not use the digit 0 or the capital letter O. This is odd because every other encoding that I can think of keep the 10 digits and differs over what letters to use.

Bases like 32 and 64 have the advantage of being trivial to convert back and forth with binary. To convert a binary number to base 2n, you start at the least significant end and convert groups of n bits. Since 58 is not a power of 2, converting to base 58 is more involved.

Bitcoin addresses

Bitcoin addresses are written in base58, and in fact base58 was developed for Bitcoin.

A Bitcoin address is a 25 byte (200 bit) number. Now

log582200 = 34.14

and so it may take up to 35 characters to represent a Bitcoin address in base58. Using base64 would have taken up to 34 characters, so base58 pays a very small price for preventing a class of errors relative to base64. Base32 would require 40 characters.

As noted above, converting between binary and base58 is more complicated than converting between binary and either base32 or base64. However, converting to base58 is trivial compared to everything else that goes into forming a Bitcoin address. The steps, documented here, involve taking an ECDSA public key, applying a secure hash function three times, and appending a checksum.

Related posts

Implementing the ChaCha RNG in Python

My previous post talked about the ChaCha random number generator (CSPRNG) and how Google is using it in a stream cipher for encryption on low-end devices. This post talks about how to implement ChaCha in pure Python.

First of all, the only reason to implement ChaCha in pure Python is to play with it. It would be more natural and more efficient to implement ChaCha in C.

RFC 8439 gives detailed, language-neutral directions for how to implement ChaCha, including test cases for intermediate results. At its core is the function that does a “quarter round” operation on four unsigned integers. This function depends on three operations:

  • addition mod 232, denoted +
  • bitwise XOR, denoted ^, and
  • bit rotation, denoted <<<=n.

In C, the += operator on unsigned integers would do what the RFC denotes by +=, but in Python working with (signed) integers we need to explicitly take remainders mod 232. The Python bitwise-or operator ^ can be used directly. We’ll write a function roll that corresponds to <<<=.

So the following line of pseudocode from the RFC

    a += b; d ^= a; d <<<= 16;

becomes

    a = (a+b) % 2**32; d = roll(d^a, 16)

in Python. One way to implement roll would be to use the bitstring library:

    from bitstring import Bits

    def roll(x, n):
        bits = Bits(uint=x, length=32)
        return (bits[n:] + bits[:n]).uint

Another approach, a little harder to understand but not needing an external library, would be

    def roll2(x, n):
        return (x << n) % (2 << 31) + (x >> (32-n))

So here’s an implementation of the ChaCha quarter round:

    def quarter_round(a, b, c, d):
        a = (a+b) % 2**32; d = roll(d^a, 16)
        c = (c+d) % 2**32; b = roll(b^c, 12)
        a = (a+b) % 2**32; d = roll(d^a,  8)
        c = (c+d) % 2**32; b = roll(b^c,  7)
        return a, b, c, d

ChaCha has a state consisting of 16 unsigned integers. A “round” of ChaCha consists of four quarter rounds, operating on four of these integers at a time. All the details are in the RFC.

Incidentally, the inner workings of the BLAKE2 secure hash function are similar to those of ChaCha.

More RNG posts

Google Adiantum and the ChaCha RNG

The ChaCha cryptographic random number generator (CSPRNG) is in the news thanks to Google’s Adiantum project. I’ll discuss what’s going on, but first a little background.

Adiantum maidenhead fern

The name of the project comes from a genus of fern. More on that below as well.

One-time pads

The one-time pad is a provably unbreakable way to encrypt things. You create a sheet of random bits and give your counterpart an exact copy. Then when it comes time for you to send an encrypted message, you convert your message to a stream of bits, XOR your message with the random bits you exchanged previously, and send the result. The recipient then takes the XOR of the received message with the pad of random bits, and recovers the original message.

This is called a one-time pad because it’s a pad of bits that you can only use one time. If you reuse a pad, it’s no longer unbreakable.

One-time pads are impractical for a couple reasons. First, it’s hard to generate truly random bits, especially in bulk. Second, exchanging the pads is almost as difficult as exchanging messages.

Stream ciphers

So here’s a bright idea: we’ll get around both of the problems with one-time pads by using pseudorandom bits rather than random bits! Then both parties can generate their own random bits.

Many people have had this idea, and it’s not necessarily a bad one. It’s called a stream cipher. The problem is that most pseudorandom number generators are not up to the task. You need a cryptographically secure RNG, and most RNGs are far from secure. The ChaCha RNG, however, appears to be good enough to use in a stream cipher, given enough rounds of scrambling [1], and Google is using it for full disk encryption in Android devices.

Full disk encryption

If you forget your password to your computer, you may not be able to access your data, but a thief still could by removing the hard drive and accessing it from another computer. That is, unless the disk is encrypted.

Full disk encryption on a laptop, such as BitLocker on Windows or FileVault on OSX, is usually implemented via AES encryption with hardware acceleration. If you don’t have special hardware for encryption, AES can be too slow.

Adiantum: ChaCha encryption on Android

On low-end devices, ChaCha encryption can be around 5x faster than AES. So Google is using ChaCha for Android devices, using what it calls Adiantum.

You can read the technical details in [2], and you can read more about the ChaCha random number generator in [3].

So where does the name Adiantum come from? It’s a Victorian name for a genus of maidenhair ferns, symbolic of sincerity and discretion.

More on CSPRNGs

[1] Adiantum using ChaCha with 12 rounds. TLS 1.3 uses ChaCha with 20 rounds.

[2] Adiantum: length-preserving encryption for entry-level processors by Google employees Paul Crowley and Eric Biggers.

[3] IRTF RFC 8439: ChaCha20 and Poly1305 for IETF Protocols

Congress and the Equifax data breach

Dialog from a congressional hearing February 26, 2019.

Representative Katie Porter: My question for you is whether you would be willing to share today your social security, your birth date, and your address at this public hearing.

Equifax CEO Mark Begor: I would be a bit uncomfortable doing that, Congresswoman. If you’d so oblige me, I’d prefer not to.

KP: Could I ask you why you’re unwilling?

MB: Well that’s sensitive information. I think it’s sensitive information that I like to protect, and I think consumers should protect theirs.

KP: My question is then, if you agree that exposing this kind of information, information like that you have in your credit reports, creates harm, therefore you’re unwilling to share it, why are your lawyers arguing in federal court that there was no injury and no harm created by your data breach?

Related posts