Implementing the ChaCha RNG in Python

My previous post talked about the ChaCha random number generator and how Google is using it in a stream cipher for encryption on low-end devices. This post talks about how to implement ChaCha in pure Python.

First of all, the only reason to implement ChaCha in pure Python is to play with it. It would be more natural and more efficient to implement ChaCha in C.

RFC 8439 gives detailed, language-neutral directions for how to implement ChaCha, including test cases for intermediate results. At its core is the function that does a “quarter round” operation on four unsigned integers. This function depends on three operations:

  • addition mod 232, denoted +
  • bitwise XOR, denoted ^, and
  • bit rotation, denoted <<<=n.

In C, the += operator on unsigned integers would do what the RFC denotes by +=, but in Python working with (signed) integers we need to explicitly take remainders mod 232. The Python bitwise-or operator ^ can be used directly. We’ll write a function roll that corresponds to <<<=.

So the following line of pseudocode from the RFC

    a += b; d ^= a; d <<<= 16;

becomes

    a = (a+b) % 2**32; d = roll(d^a, 16)

in Python. One way to implement roll would be to use the bitstring library:

    from bitstring import Bits

    def roll(x, n):
        bits = Bits(uint=x, length=32)
        return (bits[n:] + bits[:n]).uint

Another approach, a little harder to understand but not needing an external library, would be

    def roll2(x, n):
        return (x << n) % (2 << 31) + (x >> (32-n))

So here’s an implementation of the ChaCha quarter round:

    def quarter_round(a, b, c, d):
        a = (a+b) % 2**32; d = roll(d^a, 16)
        c = (c+d) % 2**32; b = roll(b^c, 12)
        a = (a+b) % 2**32; d = roll(d^a,  8)
        c = (c+d) % 2**32; b = roll(b^c,  7)
        return a, b, c, d

ChaCha has a state consisting of 16 unsigned integers. A “round” of ChaCha consists of four quarter rounds, operating on four of these integers at a time. All the details are in the RFC.

Incidentally, the inner workings of the BLAKE2 secure hash function are similar to those of ChaCha.

Related posts

Random number generation posts

Random number generation is typically a two step process: first generate a uniformly distributed value, then transform that value to have the desired distribution. The former is the hard part, but also the part more likely to have been done for you in a library. The latter is relatively easy in principle, though some distributions are hard to (efficiently) sample from.

Here are some posts on testing a uniform RNG.

Here’s a book chapter I wrote on testing the transformation of a uniform RNG into some other distribution.

A few posts on manipulating a random number generator.

And finally, a couple posts on a cryptographically secure random number generators.

A cryptographically secure random number generator

A random number generator can have excellent statistical properties and yet not be suited for use in cryptography. I’ve written a few posts to demonstrate this. For example, this post shows how to discover the seed of an LCG random number generator.

This is not possible with a secure random number generator. Or more precisely, it is not practical. It may be theoretically possible, but doing so requires solving a problem currently believed to be extremely time-consuming. (Lots of weasel words here. That’s the nature of cryptography. Security often depends on the assumption that a problem is as hard to solve as experts generally believe it is.)

Blum Blum Shub algorithm

The Blum Blum Shub algorithm for generating random bits rests on the assumption that a certain number theory problem, the quadratic residuosity problem, is hard to solve. The algorithm is simple. Let M = pq where p and q are large primes, both congruent to 3 mod 4. Pick a seed x0 between 1 and M and relatively prime to M. Now for n > 0, set

xn+1 = xn² mod M

and return the least significant bit of xn+1. (Yes, that’s a lot of work for just one bit. If you don’t need cryptographic security, there are much faster random number generators. In fact, even if you do need cryptographic quality, there are faster generators.)

Python implementation

Here’s some Python code to illustrate using the generator. The code is intended to be clear, not efficient.

Given two large (not necessarily prime) numbers x and y, the code below finds primes p and q for the algorithm and checks that the seed is OK to use.

    import sympy

    # super secret large numbers
    x = 3*10**200
    y = 4*10**200
    seed = 5*10**300

    def next_usable_prime(x):
        # Find the next prime congruent to 3 mod 4 following x.
        p = sympy.nextprime(x)
        while (p % 4 != 3):
            p = sympy.nextprime(p)
        return p

    p = next_usable_prime(x)
    q = next_usable_prime(y)
    M = p*q

    assert(1 < seed < M)
    assert(seed % p != 0)
    assert(seed % q != 0)

There’s a little bit of a chicken-and-egg problem here: how do you pick x, y, and seed? Well, you could use a cryptographically secure random number generator ….

Now let’s generate a long string of bits:

# Number of random numbers to generate
N = 100000     

x = seed
bit_string = ""
for _ in range(N):
    x = x*x % M
    b = x % 2
    bit_string += str(b)

Testing

I did not test the output thoroughly; I didn’t use anything like DIEHARDER or PractRand as in previous posts, but just ran a couple simple tests described here.

First I look at the balance of 0’s and 1’s.

    Number of 1's: 50171
    Expected: 49683.7 to 50316.2

Then the longest run. (See this post for a discussion of expected run length.)

    Run length: 16
    Expected: 12.7 to 18.5

Nothing unusual here.

The Blums

The first two authors of Blum Blum Shub are Lenore and Manuel Blum. The third author is Michael Shub.

I had a chance to meet the Blums at the Heidelberg Laureate Forum in 2014. Manuel Blum gave a talk that year on mental cryptography that I blogged about here and followed up here. He and his wife Lenore were very pleasant to talk with.