RSA munitions T-shirt

Back when the US government classified strong encryption as “munitions,” RSA public key cryptography was illegal to export. In 1995, Adam Back protested this by creating a terse, obfuscated implementation of RSA in Perl code and used it as an email signature.

The code was also printed on T-shirts. The shirt was classified as munitions because it contained source code for strong encryption. More on the shirt here.

Adam Back's munitions T-shirt

This was the code:

#!/bin/perl -s-- -export-a-crypto-system-sig -RSA-3-lines-PERL
$m=unpack(H.$w,$m."\0"x$w),$_=`echo "16do$w 2+4Oi0$d*-^1[d2%Sa
2/d0<X+d*La1=z\U$n%0]SX$k"[$m*]\EszlXx++p|dc`,s/^.|\W//g,print
pack('H*',$_)while read(STDIN,$m,($w=2*$d-1+length$n&~1)/2)

My initial intention was to unpack the code, explaining each piece in detail. I don’t have the time or patience for that, and I imagine many readers don’t either. For more of a blow-by-blow explanation, see this commentary from 1995.

dc

In the middle of the code is

    echo ... | dc

This is the most dense and most important part of the code. Perl calls the dc calculator to do the arbitrary precision arithmetic that RSA encryption requires.

I’ve written about bc several times. bc (“basic calculator”) was a originally a more user-friendly wrapper around the reverse-Polish dc (“desktop calculator”). dc is still part of every Unix and Unix-like system, but I imagine bc is far more popular.

The important feature of dc for this post is that it is stack-based, meaning that users would push data and commands on to the stack and pop results off the stack. A sequence of commands that might be understandable when interactively using dc would look cryptic in a transcript. This is part of what makes the code so cryptic.

I’ll parse just a tiny bit of the dc code to give a flavor of what it does. The first four characters 16do instructs dc to push 16 on to the stack, duplicate it, and set the output radix to 16, i.e. these four characters tell dc to work in hexadecimal.

Believe it or not, the dc code is computing

mk mod n

using fast exponentiation, which is the key step in the RSA algorithm.

Textbook RSA

Note that Adam Back’s code is computing what we would now call textbook RSA, not RSA as it has been refined over the years and is currently implemented.

Related posts

Solving a chess puzzle with Claude and Prolog

Prolog is the original logic programming language. The name comes from programming in logic. More specifically, the name comes from programmation en logique because the inventor of the language, Philippe Roussel, is French.

Prolog has its advantages and disadvantages. One of the advantages is that the language represents logical problems directly. One of the disadvantages is that the syntax can be quirky. But if an LLM is writing the code, or at least helping to write the code, the syntax doesn’t matter so much.

I wanted to see how well Claude (Sonnet 4.6, medium effort) could solve a chess puzzle by Martin Gardner that I wrote about a little over a year ago. I chose a relatively obscure problem rather than something like the Eight Queens puzzle because an LLM could simply quote one of countless articles on the puzzle.

The puzzle

As I stated in the post last year, the task is to place two rooks, two bishops, and two knights on a 4 by 4 chessboard so that no piece attacks any other.

There are two basic solutions, twelve if you count rotations and reflections as different solutions.

Prolog results

Claude wrote an SWI-Prolog program that I ran with

swipl -g "run, halt" chess_placement.pl

and it gave the following output.

Total solutions (raw): 12
Solutions up to D4 symmetry: 2
Solutions up to D4 symmetry: 2

Solution 1:
B . . N
N . . B
. R . .
. . R .

Solution 2:
. B N .
. . . R
R . . .
. N B .

Claude made a nice image of the solutions.

Note that you can make eight variations of the first solution by a combination of rotations and flips. There are only four variations of the second solution because two rotations put you back where you started.

Prolog source

Here’s the Prolog code that Claude generated.

% ============================================================
%  Non-attacking placement of 2 Rooks, 2 Bishops, 2 Knights
%  on a 4x4 chessboard
%
%  Run with SWI-Prolog:
%    swipl -g "run, halt" chess_placement.pl
%
%  Or interactively:
%    swipl chess_placement.pl
%    ?- run.
%    ?- count_solutions(N).
%    ?- count_unique(N).
% ============================================================

:- use_module(library(lists)).

% ---- Board squares: (Row, Col) with Row, Col in 0..3 ----------------

square(R, C) :- member(R, [0,1,2,3]), member(C, [0,1,2,3]).

% ---- Attack predicates ----------------------------------------------

rook_attacks(R,_,R,_).          % same row
rook_attacks(_,C,_,C).          % same column

bishop_attacks(R1,C1,R2,C2) :-
    DR is abs(R1-R2), DC is abs(C1-C2),
    DR =:= DC, DR > 0.

knight_attacks(R1,C1,R2,C2) :-
    DR is abs(R1-R2), DC is abs(C1-C2),
    ( DR =:= 1, DC =:= 2 ; DR =:= 2, DC =:= 1 ).

% attacks(+Type1, +R1, +C1, +Type2, +R2, +C2)
% True if either piece threatens the other's square.
attacks(rook,   R1,C1, _,     R2,C2) :- rook_attacks(R1,C1,R2,C2).
attacks(_,      R1,C1, rook,  R2,C2) :- rook_attacks(R1,C1,R2,C2).
attacks(bishop, R1,C1, _,     R2,C2) :- bishop_attacks(R1,C1,R2,C2).
attacks(_,      R1,C1, bishop,R2,C2) :- bishop_attacks(R1,C1,R2,C2).
attacks(knight, R1,C1, _,     R2,C2) :- knight_attacks(R1,C1,R2,C2).
attacks(_,      R1,C1, knight,R2,C2) :- knight_attacks(R1,C1,R2,C2).

% ---- All pairs safe -------------------------------------------------

% Check all unordered pairs in a list.
no_attack_pair(_, []).
no_attack_pair(T1-(R1,C1), [T2-(R2,C2)|Rest]) :-
    \+ attacks(T1,R1,C1,T2,R2,C2),
    no_attack_pair(T1-(R1,C1), Rest).

all_pairs_safe([]).
all_pairs_safe([P|Rest]) :-
    no_attack_pair(P, Rest),
    all_pairs_safe(Rest).

% ---- Generate a placement -------------------------------------------
%
%  Placement = [rook-(R1,C1), rook-(R2,C2),
%               bishop-(R3,C3), bishop-(R4,C4),
%               knight-(R5,C5), knight-(R6,C6)]
%
%  We enforce an ordering within each pair to avoid counting
%  same-type duplicates (rook1 vs rook2 swapped).
%  Squares are represented canonically: (R,C) with R*4+C as index.

sq_index(R, C, I) :- I is R*4 + C.

placement(Placement) :-
    % --- Rooks ---
    square(R1r, C1r), square(R2r, C2r),
    sq_index(R1r,C1r,I1r), sq_index(R2r,C2r,I2r),
    I1r < I2r,          % canonical order for the two rooks

    % --- Bishops ---
    square(R1b, C1b), square(R2b, C2b),
    sq_index(R1b,C1b,I1b), sq_index(R2b,C2b,I2b),
    I1b < I2b,          % canonical order for the two bishops

    % --- Knights ---
    square(R1n, C1n), square(R2n, C2n),
    sq_index(R1n,C1n,I1n), sq_index(R2n,C2n,I2n),
    I1n < I2n, % canonical order for the two knights % --- All six squares distinct --- Squares = [(R1r,C1r),(R2r,C2r),(R1b,C1b),(R2b,C2b),(R1n,C1n),(R2n,C2n)], all_distinct_squares(Squares), % --- Build placement list --- Placement = [ rook-(R1r,C1r), rook-(R2r,C2r), bishop-(R1b,C1b), bishop-(R2b,C2b), knight-(R1n,C1n), knight-(R2n,C2n) ], % --- No piece attacks any other --- all_pairs_safe(Placement). all_distinct_squares([]). all_distinct_squares([S|Rest]) :- \+ member(S, Rest), all_distinct_squares(Rest). % ---- Count all solutions -------------------------------------------- count_solutions(N) :- findall(P, placement(P), Ps), length(Ps, N), format("Total solutions (raw): ~w~n", [N]). % ---- D4 symmetry transformations on a 4x4 board --------------------- % Squares (R,C) in 0..3 transform(identity, R, C, R, C). transform(rot90, R, C, C, Nr) :- Nr is 3-R. transform(rot180, R, C, Nr, Nc) :- Nr is 3-R, Nc is 3-C. transform(rot270, R, C, Nc, R) :- Nc is 3-C. transform(flipH, R, C, R, Nc) :- Nc is 3-C. transform(flipV, R, C, Nr, C) :- Nr is 3-R. transform(flipD1, R, C, C, R). transform(flipD2, R, C, Nr, Nc) :- Nr is 3-C, Nc is 3-R. apply_transform(_, [], []). apply_transform(T, [Type-(R,C)|Rest], [Type-(NR,NC)|TRest]) :- transform(T, R, C, NR, NC), apply_transform(T, Rest, TRest). % Canonical form: sort pieces within same-type pairs, then sort the % whole list to get a unique representative. canonical_placement(Placement, Canonical) :- findall(T, member(T,[identity,rot90,rot180,rot270, flipH,flipV,flipD1,flipD2]), Ts), maplist(transform_and_sort(Placement), Ts, AllForms), msort(AllForms, Sorted), Sorted = [Canonical|_]. transform_and_sort(Placement, T, Sorted) :- apply_transform(T, Placement, TPl), msort(TPl, Sorted). % ---- Count solutions up to D4 symmetry ------------------------------ count_unique(N) :- findall(P, placement(P), Ps), maplist(canonical_placement, Ps, Canonicals), list_to_set(Canonicals, Unique), length(Unique, N), format("Solutions up to D4 symmetry: ~w~n", [N]). % ---- Pretty-print a board ------------------------------------------- print_board(Placement) :- forall(member(R, [0,1,2,3]), print_row(R, Placement)), nl. print_row(R, Placement) :- forall(member(C, [0,1,2,3]), print_cell(R, C, Placement)), nl. print_cell(R, C, Placement) :- ( member(rook-(R,C), Placement) -> write('R ')
    ;   member(bishop-(R,C), Placement) -> write('B ')
    ;   member(knight-(R,C), Placement) -> write('N ')
    ;   write('. ')
    ).

% ---- Print all unique solutions -------------------------------------

print_unique_solutions :-
    findall(P, placement(P), Ps),
    maplist(canonical_placement, Ps, Canonicals),
    list_to_set(Canonicals, Unique),
    length(Unique, N),
    format("~nSolutions up to D4 symmetry: ~w~n~n", [N]),
    forall(nth1(I, Unique, Sol),
           ( format("Solution ~w:~n", [I]),
             print_board(Sol) )).

% ---- Top-level entry point ------------------------------------------

run :-
    count_solutions(Raw),
    count_unique(Sym),
    format("~n"),
    print_unique_solutions,
    format("Summary: ~w raw solutions, ~w up to D4 symmetry.~n",
           [Raw, Sym]).

Aitken acceleration before Aitken

Kepler solved his eponymous equation

ME − e sin(E)

by finding a fixed point of

E = M + e sin(E).

So guess a value of E and stick it into the right hand side. Then plug that value into the right hand side again. Kepler said a couple iterations should be enough. And a couple iterations are enough if the eccentricity e is small and you don’t need much accuracy.

The rate of convergence is determined by e. Kepler implicitly had in mind small values of e because he wasn’t aware of anything orbiting the sun in a highly elliptical orbit. Here’s an example with eccentricity 0.05, about the eccentricity of the orbits of Jupiter and Saturn.

from math import sin

M, E, e = 1, 1, 0.05
for _ in range(5):
    E = M + e*sin(E)
    residual = M - (E - e*sin(E))
    print(residual)

The residual after just two iterations is 2.77 × 10−5. If you change e to 0.2, the eccentricity of Mercury’s orbit, it takes three iterations to get comparable accuracy. Mercury has the most eccentric orbit of any object Kepler would have known about.

Now suppose you’d like to solve for E when M = 1 for Halley’s comet, and you’d like an error of less than 10−8. Now you need 16 iterations.

C. F. W. Peters discovered a faster algorithm in 1891.

E1 = M + e sin(E0)
E2 = M + e sin(E1)
E3 = (E2 E0E1²)/(E2 − 2E1 + E0)

Let’s look at the results of doing three iterations of Peters’ method for Halley’s comet.

M, E0, e = 1, 1, 0.967
for _ in range(3):
    E1 = M + e*sin(E0)
    E2 = M + e*sin(E1)
    E3 = (E2*E0 - E1**2)/(E2 - 2*E1 + E0)
    residual = M - (E - e*sin(E3))
    print(residual)
    E0 = E3 # for next iteration

This gives a residual of −7.23 × 10−10. Each iteration of Peters’ method requires a little more than twice as much work as an iteration of Kepler’s method, but 3 iterations of Peters’ method accomplished more than 16 iterations of Kepler’s method.

Peters’ algorithm from 1891 was a special case of Alexander Aitken’s series acceleration method published in 1926.

Related posts

The Latin of Linux

One reason people study Latin is that it is the ancestor of many modern languages. English derives from West Germanic languages, not from Latin, but much of English vocabulary, perhaps as much as 60%, derives from Latin, either directly or indirectly through French.

Knowing a bit of Latin makes sense of many things that would otherwise seem completely arbitrary, such as why the symbols for gold, silver, and lead are Au, Ag, and Pb respectively.

Similarly, ed(1) is the Latin of Linux [1]. Many conventions in command line utilities follow conventions that go back to the ed(1) line editor. They may go back even further. Just as Latin didn’t come out of nowhere, neither did ed(1), but you can’t go back indefinitely. It’s convenient to start history somewhere, and this post will start with ed(1) just as much discussion of Western linguistics starts with Latin.

The following are features of ed(1) that live on in sed, awk, grep, vi, perl, bash, etc.

  1. Using slashes to delimit regular expressions
  2. Using $ to indicate the end of a line or the end of a file
  3. The pattern of specifying address + action or address range + action
  4. Using regular expressions as address ranges
  5. Using \1, \2, etc to refer to regex captures
  6. Using & to refer to the entire matched text
  7. The g/regexp/command pattern
  8. Using p for printing lines, as in g/re/p
  9. The commands a, c, d, i, j, l, p, q, r, and w in vi
  10. ! for shell escape

 

[1] Because the name “ed” is so short, and looks so much like the name Ed, it’s convenient to use its full Unix name ed(1). The parenthesized number is used to disambiguate different things that have the same name, such as the user command kill(1) and the system call kill(2). There is no ed(2) or any other higher-numbered ed. The number is there to make the name stand out, not to disambiguate anything.

Naively summing an alternating series

Suppose you run across the power series for the exponential function and decide to code it up. Good idea: you’ll probably learn something, though maybe not what you expect.

Maybe you decide a tolerance of 10−12 is good enough, and so you sum the terms until the next term to add is below the tolerance.

from math import factorial, exp

def naive_exp(x):
    tolerance = 1e-12
    s = 0
    n = 0
    while True:
        delta = x**n / factorial(n)
        s += delta
        if abs(delta) < tolerance:
            return s
        n += 1

You want to try your program out, so you compute e by calling the function at 1. If you compare this to calling exp(1) you find that you got all the digits correct.

Now you try computing exp(-20). Calling naive_exp(-20) gives

    5.47893091802112e-10

but calling exp(-20) gives

    2.061153622438558e-09

Don’t brush things like this as flukes or compiler bugs [1]. This is your golden opportunity to learn something.

Maybe you add a print statement to see the intermediate values of the sum stored in the variable s. If you do, you’ll see that the partial sums oscillate wildly before settling down.

Maybe that seems wrong, but then you look more carefully at the series. The nth term is xn/n!. Since x is negative, the terms alternate in sign. And the absolute values of the term get bigger before they get smaller. When x = −20, each numerator is 20 times larger than the previous, and each denominator is n times larger than the previous. So the terms will get bigger until n > 20. So the wild oscillations are real, not a bug.

The largest partial sum is 21822593.77927747 in absolute value. You know that exp(−20) is a very small number, so there’s going to have to be a lot of cancellation before the partial sums settle down to a small number. Maybe you’ve heard that cancellation is where numerical calculations lose precision. If not, now you know!

Look again at the largest partial sum. There are eight figures to the right of the decimal point. The code is printing out results to as much precision as it has, so the error at this point is on the order of 10−8. We’re trying to compute a number on the order of 10−9, and if any digits in our result are correct, it would be a coincidence.

If you go back and try your code on x = −22, the result is even worse, giving a negative result for a quantity that for theoretical reasons cannot be negative. But you can see why: you’re asking the code to compute a number that is closer to zero than the accuracy of the code.

Computers don’t represent numbers in base 10 internally, but the argument above is sufficient in this case. If you want to dig deeper, look into the anatomy of a floating point number.

There is a simple way around the problem above, but discovering it sooner would short-circuit the learning process. You could calculate exp(−20) as 1/exp(20) and avoid all the cancellation because the series for exp(20) does not alternate.

 

[1] Compilers do have bugs occasionally, but it’s orders of magnitude more likely that something is wrong with your code.

Online (one-pass) algorithms

Canonical example

The sample variance of a set of numbers is defined in terms of the sum of the squared distances from each point to the mean.

s^2 = \frac{1}{n-1}\sum_{i=1}^n (x_i -\bar{x})^2

So it would seem that you first need to calculate the mean, then go back and compute the squared differences from the mean. And yet sample variance can be computed in one pass through the data.

You’ll find two equivalent equations in statistics books: the one described above and another based on the sum of the data points and the sum of the data points squared.

s^2 = \frac{1}{n(n-1)}\left(n\sum_{i=1}^n x_i^2 -\left(\sum_{i=1}^n x_i\right)^2\right)

While this equation is theoretically correct, it is numerically unstable. Code that directly implements this equation can return a negative value for a quantity that is theoretically positive. I’ve seen this happen with real data, causing a program to crash when taking the square root of the variance to get the standard deviation.

However, there is an algorithm that computes mean and variance in one pass that is accurate and numerically stable. This algorithm was developed by B. P. Welford in 1962. I discuss Welford’s algorithm and give code for implementing it here.

Online algorithms

Welford’s algorithm is known in computer science as an “online” algorithm. This term was coined well before the Internet. For example, see the paper [1] from 1965.

But of course now “online” means something else, and so the technical and colloquial uses of “online algorithm” have split. Technical literature uses the phrase to describe the kinds of algorithms in this post. Most people would take “online algorithm” to mean code that runs on a remote server. You may see “streaming algorithm” as a contemporary technical term, but I’d still search on “online algorithm” to find papers.

Computing higher moments online

Welford’s algorithm computes the first two moments, mean and variance, of a data set online. It is also possible to compute skewness and kurtosis online, as well as higher moments.

Online regression

Simple linear regression is closely related to calculating mean and variance, and it is possible to compute simple regression coefficients online. I have some old notes on this here.

This post was motivated by an email asking me about multiple regression. It is also possible to compute multiple regression coefficients online, but I haven’t done this. I found a couple references, [2] and [3], but I have not read them. There is a simple procedure for two predictor variables but I believe things get a little more complicated with three or more predictors, requiring a recursive least squares algorithm.

Related posts

The notion of online algorithms is closely related to the notion of a fold in functional programming. Here are several posts on computing things with folds.

[1] One-Tape, Off-Line Turing Machine Computations by F. C. Hennie. Information and Control. 8, 553-578 (1965). Available here. In this paper Hennie writes “In an on-line computation the input data are supplied to the machine, one symbol at a time, at a special input terminal. … In an off-line computation all of the input symbols are written on one of the machine’s tapes prior to the start of the computation.

[2] Arthur Albert and Robert W. Sittler, “A Method for Computing Least Squares Estimators that Keep Up with the Data,” Journal of the Society for Industrial and Applied Mathematics, Series A: Control, 3(3), 384–417, 1965. DOI: 10.1137/0303026.

[3] Petre Stoica and Per Ashgren. Exact initialization of the recursive least-squares algorithm. Int. J. Adapt. Control Signal Process. 2002; 16:219&ndashh;230.

Calculating the expected range of normal samples

The previous post looked at the expected IQ range in a jury of 12. This post will look more generally at computing the expected range of n samples from a N(0, 1) random variable. This will give the expected range in units of σ, i.e. multiply the results by σ if your σ isn’t 1.

As mentioned in the previous post, the expected range is given by

d_n = 2n \int_{-\infty}^\infty \Phi(x)^{n-1} \, x\,\phi(x) \, dx

where φ and Φ are the PDF and CDF of a standard normal. The integral can be calculated in closed form for n ≤ 5, but in general it requires numerical integration [1].

The following Python code can compute dn.

from scipy.stats import norm
from scipy.integrate import quad
import numpy as np

def d(n):
    integrand = lambda x: x*norm.pdf(x)*norm.cdf(x)**(n-1)
    res, info = quad(integrand, -np.inf, np.inf)
    return 2*n*res

For large n we have the asymptotic approximation

d_n = 2 \Phi^{-1}\left( \frac{n \,–\, 0.375}{ n + 0.25} \right)

which we could implement in Python by

def approx(n):
    return 2*norm.ppf((n - 0.375)/(n + 0.25))

For very large n the asymptotic expression may be more accurate than the integral due to numerical integration error.

Here are a few example values.

|-----+-------|
|   n |   d_n |
|-----+-------|
|   2 | 1.128 |
|   3 | 1.693 |
|   5 | 2.326 |
|  10 | 3.078 |
|  12 | 3.258 |
|  23 | 3.858 |
|  50 | 4.498 |
| 100 | 5.015 |
|-----+-------|

[1] Order Statistics by H. A. David. John Wiley & Sons. 1970.

Recovering the state of xorshift128

I’ve written a couple posts lately about reverse engineering the internal state of a random number generator, first Mersenne Twister then lehmer64. This post will look at xorshift128, implemented below.

import random

# Seed the generator state
a: int = random.getrandbits(32)
b: int = random.getrandbits(32)
c: int = random.getrandbits(32)
d: int = random.getrandbits(32)

MASK = 0xFFFFFFFF

def xorshift128() -> int:
    global a, b, c, d

    t = d
    s = a

    t ^= (t << 11) & MASK t ^= (t >>  8) & MASK
    s ^= (s >> 19) & MASK

    a, b, c, d = (t ^ s) & MASK, a, b, c

    return a

Recovering state

Recovering the internal state of the generator is simple: it’s the four latest outputs in reverse order. This is illustrated by the following chart.

 a b c d output 5081e5ca 79259a41 63e12955 651e537d c793d11c c793d11c 5081e5ca 79259a41 63e12955 ad52e33a ad52e33a c793d11c 5081e5ca 79259a41 f8f09343 f8f09343 ad52e33a c793d11c 5081e5ca a7009622 a7009622 f8f09343 ad52e33a c793d11c fe42a8ef

This means that once we’ve seen four outputs, we can predict the rest of the outputs. The following code demonstrates this.

Let’s generate five random values.

out = [xorshift128() for _ in range(5)]

Running

print(hex(out[4]))

shows the output 0xc3f4795d.

If we reset the state of the generator using the first four outputs

d, c, b, a, _ = out
print(hex(xorshift128()))

we get the same result.

Good stats, bad security

Mersenne Twister and lehmer64 have good statistical properties, despite being predictable. The xorshift128 generator is even easier to predict, but it also has good statistical properties. These generators would be fine for many applications, such as Monte Carlo simulation, but disastrous for use in cryptography.

The post on lehmer64 mentioned at the end that the internal state of PCG64 can also be recovered from its output. However, doing so requires far more sophisticated math and thousands of hours of compute time. Still, it’s not adequate for cryptography. For that you’d need a random number generator designed to be secure, such as ChaCha.

So why not just use a cryptographically secure random number generator (CSPRNG) for everything? You could, but the other generators mentioned in this post use less memory and are much faster. PCG64 occupies an interesting middle ground: simple and fast, but not easily reversible.

Initialize and print 128-bit integers in C

If you look very closely at my previous post, you’ll notice that I initialize a 128-bit integer with a 64-bit value. The 128-bit unsigned integer represents the internal state of a random number generator. Why not initialize it to a 128-bit value? I was trying to keep the code simple.

A surprising feature of C compilers, at least of GCC and Clang, is that you cannot initialize a 128-bit integer to a 128-bit integer literal. You can’t directly print a 128-bit integer either, which is why the previous post introduces a function print_u128.

The code

__uint128_t x = 0x00112233445566778899aabbccddeeff;

Produces the following error message.

error: integer literal is too large to be represented in any integer type

The problem isn’t initializing a 128-bit number to a 128-bit value; the problem is that the compiler cannot parse the literal expression

0x00112233445566778899aabbccddeeff

One solution to the problem is to introduce the macro

#define U128(hi, lo) (((__uint128_t)(hi) << 64) | (lo))

and use it to initialize the variable.

__uint128_t x = U128(0x0011223344556677, 0x8899aabbccddeeff);

You can verify that x has the intended state by calling print_u128 from the previous post.

void print_u128(__uint128_t n)
{
    printf("0x%016lx%016lx\n",
           (uint64_t)(n >> 64),      // upper 64 bits
           (uint64_t)n);             // lower 64 bits
}

Then

print_u128(x);

prints

0x00112233445566778899aabbccddeeff

Update. The code for print_u128 above compiles cleanly with gcc but clang gives the following warning.

warning: format specifies type 'unsigned long' but the argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat]

You can suppress the warning by including the inttypes header and modifying the print_u128 function.

Here’s the final code. It compiles cleanly under gcc and clang.

#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#define U128(hi, lo) (((__uint128_t)(hi) << 64) | (lo))

void print_u128(__uint128_t n)
{
    printf("0x%016" PRIx64 "%016" PRIx64 "\n",
           (uint64_t)(n >> 64),
           (uint64_t)n);
}

int main(void)
{
    __uint128_t x = U128(0x0011223344556677, 0x8899aabbccddeeff);
    print_u128(x);
    return 0;
}

Hacking the lehmer64 RNG

A couple days ago I wrote about hacking the Mersenne Twister. I explained how to recover the random number generator’s internal state from a stream of 640 outputs.

This post will do something similar with the lehmer64 random number generator. This generator is very simple to implement. Daniel Lemire found it to be “the fastest conventional random number generator that can pass Big Crush,” a well-respected test for pseudorandom number generators.

Implementing lehmer64

The lehmer64 generator can be implemented in C by

__uint128_t g_lehmer64_state;

uint64_t lehmer64() {
    g_lehmer64_state *= 0xda942042e4dd58b5ULL;  
  return g_lehmer64_state >> 64;
}

The analogous code in Python would have to simulate the overflow behavior of a 128-bit integer by reducing the state mod 2128 after the multiplication.

Reverse engineering lehmer64 is easier than reverse engineering the Mersenne Twister because only three outputs are needed. However, the theory behind the exploit is more sophisticated. See [1].

The following code sets the state to an arbitrary initial seed value and generates three values.

#include <stdio.h>
#include <stdint.h>

int main(void)
{
    g_lehmer64_state = 0x4789499d78770934; // seed
    for (int i = 0; i < 3; i++) {
        printf("0x%016lx\n", lehmer64());
    }

    return 0;
}

The code prints the following.

0x3d144d12822bcc2e
0x85a67226191a568d
0x53e803dffc88e8f8

Exploiting lehmer64

Here is Python code for recovering the state of the lehmer64 generator given in [1].

def reconstruct(X):
    a = 0xda942042e4dd58b5
    r = round(2.64929081169728e-7 * X[0] + 3.51729342107376e-7 * X[1] + 3.89110109147656e-8 * X[2])
    s = round(3.12752538137199e-7 * X[0] - 1.00664345453760e-7 * X[1] - 2.16685184476959e-7 * X[2])
    t = round(3.54263598631140e-8 * X[0] - 2.05535734808162e-7 * X[1] + 2.73269247090513e-7 * X[2])
    u = r * 1556524 + s * 2249380 + t * 1561981
    v = r * 8429177212358078682 + s * 4111469003616164778 + t * 3562247178301810180
    state = (a*u + v) % (2**128)
    return state

Let’s call reconstruct with the output of our C code.

X = [0x3d144d12822bcc2e, 0x85a67226191a568d, 0x53e803dffc88e8f8]
print( hex( reconstruct(X) ) )

This prints

0x3d144d12822bcc2e1b81101c593761c4

Now for the confusing part: at what point is the number above the state of the generator? Is it the state before or after generating the three values? Neither! It is the state after generating the first value.

We can verify this by modifying the C code as follows and rerunning it.

void print_u128(__uint128_t n)
{
    printf("0x%016lx%016lx\n",
           (uint64_t)(n >> 64),      // upper 64 bits
           (uint64_t)n);             // lower 64 bits
}

int main(void)
{
    g_lehmer64_state = 0x4789499d78770934; // seed
    for (int i = 0; i < 3; i++) {
        printf("0x%016lx\n", lehmer64());
        printf("state: ");
        print_u128(g_lehmer64_state);
    }
 
    return 0;
}

The main goal of [1] is to recover the state of the PCG generator, not the lehmer64 generator. The latter was a side quest. Recovering the state of PCG64 is much harder; the authors estimate it takes about 20,000 CPU-hours. The paper shows that a technique used as part of pursuing their main goal can quickly recover the lehmer64 state.

Related posts

[1] Charles Bouillaguet, Florette Martinez, and Julia Sauvage. Practical seed-recovery for the PCG Pseudo-Random Number Generator. IACR Transactions on Symmetric Cryptology. ISSN 2519-173X, Vol. 2020, No. 3, pp. 175–196.