Viability of unpopular programming languages

I said something about Perl 6 the other day, and someone replied asking whether anyone actually uses Perl 6. My first thought was I bet more people use Perl 6 than Haskell, and it’s well known that people use Haskell. I looked at the TIOBE Index to see whether that’s true. I won’t argue how well the index measures popularity, but for this post I’ll assume it’s a good enough proxy.

TIOBE doesn’t separate out variations on Perl [1]. What it calls Perl is 16th on the list this year, while Haskell comes in at 42nd. A few of the more obscure languages that TIOBE ranks higher than Haskell are Scratch, D, ABAP, Apex, and PL/I. Haskell has better public relations than all these languages.

There’s a lot more to viability than just popularity, though popularity matters. More users means more people to find bugs, write libraries, develop tools, answer questions, write tutorials, etc. But the benefit of community size is not linear. It goes through a sort of logistic S-curve. There’s some threshold size where the community is large enough for a language to be viable. And somewhere above that threshold you start hitting diminishing return.

It’s interesting to look at some of the languages currently less popular than Haskell but more familiar: Common Lisp (63), Erlang (66), and F# (67). These show that popularity isn’t everything.

Common Lisp has been around since 1982, and was standardizing a language that had been in development since 1958. Erlang has been around since 1986. These languages have many of the benefits of popularity listed above, accumulated over time.

There is not a huge community devoted specifically to F#, but it shares tooling and libraries with C#, the 5th language on the list. (Maybe the number of F# developers is underestimated because F# is so closely related to C#, not syntactically but in terms of infrastructure.)

Common Lisp, Erlang, and F# would all be safer bets for a production software project than several more popular languages.

Related posts:

[1] At least I don’t think they do. TIOBE does separate out some versions of Lisp as separate languages. It’s possible they do consider Perl 6 a separate language that didn’t make the top rankings.

Larry Wall deliberately introduced many natural language principles in Perl. It seems that one feature that Perl has in common with natural languages is controversy over when two dialects of a language are sufficiently different to be considered separate languages. Advocates consider Perl 6 to be a separate language but outside observers, like TIOBE, may not.


Eight-bit floating point

Researchers have discovered that for some problems, deep neural networks (DNNs) can get by with low precision weights. Using fewer bits to represent weights means that more weights can fit in memory at once. This, as well as embedded systems, has renewed interest in low-precision floating point.

Microsoft mentioned its proprietary floating point formats ms-fp8 and ms-fp9 in connection with its Brainwave Project [1]. I haven’t been able to find any details about these formats, other than that they use two- and three-bit exponents (respectively?).

This post will look at what an 8-bit floating point number would look like if it followed the pattern of IEEE floats or posit numbers. In the notation of the previous post, we’ll look at ieee<8,2> and posit<8,0> numbers. (Update: Added a brief discussion of ieee<8,3>, ieee<8,4>, and posit<8,1> at the end.)

Eight-bit IEEE-like float

IEEE floating point reserves exponents of all 0’s and all 1’s for special purposes. That’s not as much of a high price with large exponents, but with only four possible exponents, it seems very wasteful to devote half of them for special purposes. Maybe this is where Microsoft does something clever. But for this post, we’ll forge ahead with the analogy to larger IEEE floating point numbers.

There would be 191 representable finite numbers, counting the two representations of 0 as one number. There would be two infinities, positive and negative, and 62 ways to represent NaN.

The smallest non-zero number would be

2-5 = 1/32 = 0.03125.

The largest value would be 01011111 and have value

4(1 – 2-5) = 31/8 = 3.3875.

This makes the dynamic range just over two decades.

Eight-bit posit

A posit<8, 0> has no significand, just a sign bit, regime, and exponent. But in this case the useed value is 2, and so the range acts like an exponent.

There are 255 representable finite numbers and one value corresponding to ±∞.

The smallest non-zero number would be 1/64 and the largest finite number would be 64. The dynamic range is 3.6 decades.

Update: Here is a list of all possible posit<8,0> numbers.

Distribution of values

The graphs below give the distribution of 8-bit IEEE-like numbers and 8-bit posits on a log scale.

eight bit IEEE distribution

eight bit posit distribution

The distribution of IEEE-like numbers is asymmetric because much of the dynamic range comes from denormalized numbers.

The distributions of posits is approximately symmetrical. If a power of 2 is representable as a posit, so is its reciprocal. But you don’t have perfect symmetry because, for example, 3/2 is representable while 2/3 is not.

Other eight-bit formats

I had originally considered a 2-bit significand because Microsoft’s ms-fp8 format has a two-bit significand. After this post was first published it was suggested in the comments that an ieee<8, 4> float might be better than ieee<8, 2>, so let’s look at that. Let’s look at ieee<8, 3> too while we’re at it. And a posit<8, 1> too.

An ieee<8, 3> floating point number would have a maximum value of 7 and a minimum value of 2-6 = 1/64, a dynamic range of  2.7 decades. It would have 223 finite values, including two zeros, as well as 2 infinities as 30 NaNs.

An ieee<8, 4> floating point number would have a maximum value of 120 and a minimum value of 2-9 = 1/512, a dynamic range of 4.7 decades. It would have 239 finite values, including two zeros, as well as 2 infinities and 14 NaNs.

A posit<8, 1> would have a maximum value of 212 = 4096 and a minimum value of 1/4096, a dynamic range of  7.2 decades. Any 8-bit posit, regardless of the maximum number of exponent bits, will have 255 finite values and one infinity.

Near 1, an ieee<8, 4> has 3 significand bits, an ieee<8, 3> has 4, and a posit<8,1> has 4.


[1] Chung et al. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave. Available here.

Comparing range and precision of IEEE and posit

The IEEE standard 754-2008 defines several sizes of floating point numbers—half precision (binary16), single precision (binary32), double precision (binary64), quadruple precision (binary128), etc.—each with its own specification. Posit numbers, on the other hand, can be defined for any number of bits. However, the IEEE specifications share common patterns so that you could consistently define theoretical IEEE numbers that haven’t actually been specified, making them easier to compare to posit numbers.

An early post goes into the specification of posit numbers in detail. To recap briefly, a posit<nes> number has n bits, a maximum of es of which are devoted to the exponent. The bits are divided into a sign bit, regime bits, exponent bits, and fraction bits. The sign bit is of course one bit, but the other components have variable lengths. We’ll come back to posits later for comparison.

IEEE floating point range and precision

We will denote a (possibly hypothetical) IEEE floating point number as ieee<nes> to denote one with n total bits and (exactly) es exponent bits. Such a number has one sign bit and n – es -1 significand bits. Actual specifications exist for ieee<16, 5>, ieee<32, 8>, ieee<64, 11>, and ieee<128, 15>.

The exponent of a posit number is simply represented as an unsigned integer. The exponent of an IEEE floating point number equals the exponent bits interpreted as an unsigned integers minus a bias.

\text{bias} = 2^{es -1} - 1.

So the biases for half, single, double, and quad precision floats are 15, 127, 1023, and 65535 respectively. We could use the formula above to define the bias for a hypothetical format not yet specified, assuming the new format is consistent with existing formats in this regard.

The largest exponent, emax is 2es-1 – 1 (also equal to the bias), and the smallest (most negative) exponent is emin = 2 – 2es-1. This accounts for 2es-1 – 2 possible exponents. The two remaining possibilities consist of all 1’s and all 0’s, and are reserved for special use. They represent, in combination with sign and signifcand bits, special values ±0, ±∞, NaN, and denomalized numbers. (More on denormalized numbers shortly.)

The largest representable finite number has the maximum exponent and a significand of all 1’s. Its value is thus

2^{e_{\text{max}}} (1 - 2^{-s})

where s is the number of significand bits. And so the largest representable finite number is just slightly less than

2^{2^{es -1}} }

We’ll use this as the largest representable value when calculating dynamic range below.

The smallest representable normalized number (normalized meaning the signifcand represents a number greater than or equal to 1) is

2^{e_{\text{min}}} = 2^{2 - 2^{es -1}}

However, it is possible to represent smaller values with denomalized numbers. Ordinarily the significand bits fff… represent a number 1.fff… But when the exponent bit pattern consists of all 0’s, the significand bits are interpreted as 0.fff… This means that the smallest denormalized number has a significand of all o’s except for a 1 at the end. This represents a value of

2^{e_{\text{min}}} \cdot 2^{-s} = 2^{2 - 2^{es-1} - s}

where again s is the number of significand bits.

The dynamic range of an ieee<nes> number is the log base 10 of the ratio of the largest to smallest representable numbers, smallest here including denormalized numbers.

\log_{10} \left(  \frac{2^{2^{es - 1} }}{2^{2 - 2^{es-1} - s}} \right) = \log_{10}2^{2^{es} - 2 + s} = (2^{es} - 2 + s) \log_{10}2

IEEE float and posit dynamic range at comparable precision

Which posit number should we compare with each IEEE number? We can’t simply compare ieee<nes> with posit<nes>. The value n means the same in both cases: the total number of bits. And although es does mean the number of exponent bits in both cases, they are not directly comparable because posits also have regime bits that are a special kind of exponent bits. In general a comparable posit number will have a smaller es value than its IEEE counterpart.

One way to compare IEEE floating point numbers and posit numbers is to chose a posit number format with comparable precision around 1. See the first post on posits their dynamic range and significance near 1.

In the following table, the numeric headings are the number of bits in a number. The “sig” rows contain the number of sigificand bits in the representation of 1, and “DR” stands for dynamic range in decades.

|           | 16 |  32 |   64 |   128 |
| IEEE  es  |  5 |   8 |   11 |    15 |
| posit es  |  1 |   3 |    5 |     8 |
| IEEE  sig | 10 |  23 |   52 |   112 |
| posit sig | 12 |  26 |   56 |   117 |
| IEEE  DR  | 12 |  83 |  632 |  9897 |
| posit DR  | 17 | 144 | 1194 | 19420 |

Note that in each case the posit number has both more precision for numbers near 1 and a wider dynamic range.

It’s common to use a different set of posit es values that have a smaller dynamic range than their IEEE counterparts (except for 16 bits) but have more precision near 1.

|           | 16 |  32 |   64 |   128 |
| IEEE  es  |  5 |   8 |   11 |    15 |
| posit es  |  1 |   2 |    3 |     4 |
| IEEE  sig | 10 |  23 |   52 |   112 |
| posit sig | 12 |  27 |   58 |   122 |
| IEEE  DR  | 12 |  83 |  632 |  9897 |
| posit DR  | 17 |  72 |  299 |  1214 |

Python code

Here’s a little Python code if you’d like to experiment with other number formats.

from math import log10

def IEEE_dynamic_range(total_bits, exponent_bits):

    # number of significand bits
    s = total_bits - exponent_bits - 1
    return (2**exponent_bits + s - 2)*log10(2)

def posit_dynamic_range(total_bits, max_exponent_bits):
    return (2*total_bits - 4) * 2**max_exponent_bits * log10(2)

Next: See the next post for a detailed look at eight bit posits and IEEE-like floating point numbers.

Anatomy of a posit number

This post will introduce posit numbers, explain the interpretation of their bits, and discuss their dynamic range and precision.

Posit numbers are a new way to represent real numbers for computers, an alternative to the standard IEEE floating point formats. The primary advantage of posits is the ability to get more precision or dynamic range out of a given number of bits. If an application can switch from using 64-bit IEEE floats to using 32-bit posits, for example, it can fit twice as many numbers in memory at a time. That can make a big difference in the performance of applications that process large amounts of data.

Let’s back up and say what a posit number is.

Unums and posits

John Gustafson introduced unums (universal numbers) as a different way to represent real numbers using using a finite number of bits, an alternative to IEEE floating point. See, for example, his 2015 book The End of Error. Posits are a hardware-friendly version of unums.

A conventional floating point number (IEEE 754) has a sign bit, a set of bits to represent the exponent, and a set of bits called the significand (formerly called the mantissa). For details, see Anatomy of a floating point number. For a given size number, the lengths of the various parts are fixed. A 64-bit floating point number, for example, has 1 sign bit, 11 exponent bits, and 52 bits for the significand.

A posit adds an additional category of bits, known as the regime. A posit has four parts

  1. sign bit
  2. regime
  3. exponent
  4. fraction

while an IEEE floating point number has a sign bit, exponent, and significand, the latter corresponding to the fraction part of a posit. Unlike IEEE numbers, the exponent and fraction parts of a posit do not have fixed length. The sign and regime bits have first priority. Next, the remaining bits, if any, go into the exponent. If there are still bits left after the exponent, the rest go into the fraction.

The main reference for this post is [1].

Bit pattern of a posit

To understand posits in more detail, and why they have certain advantages over conventional floating point numbers, we need to unpack their bit representation. A posit number type is specified by two numbers: the total number of bits n, and the maximum number of bits devoted to the exponent, es. (Yes, it’s a little odd to use a two-letter variable name, but that’s conventional in this context.) Together we say we have a posit<nes> number.

Sign bit

As with an IEEE floating point number, the first bit of a posit is the sign bit. If the sign bit is 1, representing a negative number, take the two’s complement of the rest of the bits before unpacking the regime, exponent, and fraction bits.

Regime bits

After the sign bit come the regime bits. The number of regime bits is variable. There could be anywhere from 1 to n-1 regime bits. How do you know when the regime bits stop? When a run of identical bits ends, either because you run out of bits or because you run into an opposite bit.

If the first bit after the sign bit is a 0, then the regime bits continue until you run out of bits or encounter a 1. Similarly, if the first bit after the sign bit is a 1, the regime bits continue until you run out of bits or encounter a 0. The bit that indicates the end of a run is not included in the regime; the regime is a string of all 0’s or all 1’s.

Exponent bits

The sign bit and regime bits get first priority. If there are any bits left, the exponent bits are next in line.  There may be no exponent bits. The maximum number of exponent bits is specified by the number es. If there are at least es bits after the sign bit, regime bits, and the regime terminating bit, the next es bits belong to the exponent. If there are fewer than es bits left, what bits remain belong to the exponent.

Fraction bits

If there are any bits left after the sign bit, regime bits, regime terminating bit, and the exponent bits, they all belong to the fraction.

Interpreting the components of a posit

Next we look at how the components described above represent a real number.

Let b be the sign bit in a posit. The sign s of the number represented by the bit pattern is positive if this bit is 0 and negative otherwise.

s = (-1)^b

Let m be the number of bits in the regime, i.e. the length of the run of identical bits following the sign bit. Then let k = –m if the regime consists of all 0’s, and let km-1 otherwise.

k = \left\{ \begin{array}{ll} -m & \text{ if regime has } m \text{ 0's} \\ m-1 & \text{ if regime has } m \text{ 1's} \end{array} \right.

The useed u of the posit is determined by es, the maximum exponent size.

u = 2^{2^{\text{\small\emph{es}}} }

The exponent e is simply the exponent bits interpreted as an unsigned integer.

The fraction f is 1 + the fraction bits interpreted as following a binary point. For example, if the fraction bits are 10011, then f = 1.10011 in binary.

Putting it all together, the value of the posit number is the product of the contributions from the sign bit, regime bits, exponent bits (if any), and fraction bits (if any).

x = s\, u^k\, 2^e f = (-1)^b \, f\, 2^{e + k2^{\text{\small\emph{es}}} }

Exceptional posits

There are two exceptional posits, both with all zeros after the sign bit. A string of n 0’s represents the number zero, and a 1 followed by n-1 0’s represents ±∞.

There’s only one zero for posit numbers, unlike IEEE floats that have two kinds of zero, one positive and one negative.

There’s also only one infinite posit number. For that reason you could say that posits represent projective real numbers rather than extended real numbers. IEEE floats have two kinds of infinities, positive and negative, as well as several kinds of non-numbers. Posits have only one entity that does not correspond to a real number, and that is ±∞.

Dynamic range and precision

The dynamic range and precision of a posit number depend on the value of es. The larger es is, the larger the contribution of the regime and exponent bits will be, and so the larger range of values one can represent. So increasing es increases dynamic range. Dynamic range, measured in decades, is the log base 10 of the ratio between the largest and smallest representable positive values.

However, increasing es means decreasing the number of bits available to the fraction, and so decreases precision. One of the benefits of posit numbers is this ability to pick es to adjust the trade-off between dynamic range and precision to meet your needs.

The largest representable finite posit is labeled maxpos. This value occurs when k is as large as possible, i.e. when all the bits after the sign bit are 1’s. In this case kn-2. So maxpos equals

u^{n-2} = \left( 2^{2^{\text{\small\emph{es}}} } \right)^{n-2}

The smallest representable positive number, minpos, occurs when k is as negative as possible, i.e. when the largest possible number of bits after the sign bit are 0’s. They can’t all be zeros or else we have the representation for the number 0, so there must be a 1 on the end. In this case m = n-2 and k = 2-n.

\mbox{minpos} = u^{2-n} = \left( 2^{2^{\text{\small\emph{es}}} } \right)^{2-n} = 1/\mbox{maxpos}

The dynamic range is given by the log base 10 of the ratio between maxpos and minpos.

\log_{10}\left( 2^{2^{\text{\small\emph{es}}} } \right)^{2n-4} = (2n-4)2^{es}\log_{10}2

For example, 16-bit posit with es = 1 has a dynamic range of 17 decades, whereas a 16-bit IEEE floating point number has a dynamic range of 12 decades. The former has a fraction of 12 bits for numbers near 1, while the latter has a significand of 10 bits. So a posit<16,1> number has both a greater dynamic range and greater precision (near 1) than its IEEE counterpart.

[Update: See this post for more on the dynamic range and precision of IEEE floats of various sizes and how posits compare.]

Note that the precision of a posit number depends on its size. This is the sense in which posits have tapered precision. Numbers near 1 have more precision, while extremely big numbers and extremely small numbers have less. This is often what you want. Typically the vast majority of numbers in a computation are roughly on the order of 1, while with the largest and smallest numbers, you mostly want them to not overflow or underflow.

Related post: Anatomy of a floating point number


[1] John L. Gustafson and Isaac Yonemoto. Beating Floating Point at its Own Game: Posit Arithmetic. DOI: 10.14529/jsfi170206


Random number generation posts

Random number generation is typically a two step process: first generate a uniformly distributed value, then transform that value to have the desired distribution. The former is the hard part, but also the part more likely to have been done for you in a library. The latter is relatively easy in principle, though some distributions are hard to (efficiently) sample from.

Here are some posts on testing a uniform RNG.

Here’s a book chapter I wrote on testing the transformation of a uniform RNG into some other distribution.

A few posts on manipulating a random number generator.

And finally, a post on a cryptographically secure random number generator.

Quaint supercomputers

The latest episode of Star Trek Discovery (S1E4) uses the word “supercomputer” a few times. This sounds jarring. The word has become less common in contemporary usage, and seems even more out of place in a work of fiction set more than two centuries in the future.

According to Google’s Ngram Viewer, the term “supercomputer” peaked in 1990.

Google Ngram Viewer results for supercomputer

(The term “cluster” is far more common, but it is mostly a non-technical term. It’s used in connection with grapes, for example, much more often than with computers.)

Years ago you’d hear someone say a problem would take a “big computer,” but these days you’re more likely to hear someone say a problem is going to take “a lot of computing power.” Hearing that a problem is going to require a “big computer” sounds as dated as saying something would take “a big generator” rather than saying it would take a lot of electricity.

Like electricity, computing power has been commoditized. We tend to think in terms of the total amount of computing resources needed, measured in, say, CPU-hours or number of low-level operations. We don’t think first about what configuration of machines would deliver these resources any more than we’d first think about what machines it would take to deliver a certain quantity of electrical power.

There are still supercomputers and problems that require them, though an increasing number of computationally intense projects do not require such specialized hardware.

Cellular automata with random initial conditions

The previous post looked at a particular cellular automaton, the so-called Rule 90. When started with a single pixel turned on, it draws a Sierpinski triangle. With random starting pixels, it draws a semi-random pattern that retains features like the Sierpinski triangle.

There are only 256 possible elementary cellular automata, so it’s practical to plot them all. I won’t list all the images here—you can find them all here—but I will give a few examples to show the variety of patterns they produce. As in the previous post, we imagine our grid rolled up into a cylinder, i.e. we’ll wrap around if necessary to find pixels diagonally up to the left and right.

rule 8 with random initial conditions
rule 18 with random initial conditions
rule 29 with random initial conditions
rule 30 with random initial conditions
rule 108 with random initial conditions
rule 129 with random initial conditions

As we discussed in the previous post, the number of a rule comes from what value it assigns to each of eight possible cellular states, turned into a binary number. So it’s plausible that binary numbers with more 1’s correspond to more black pixels. This is roughly true, though the graph below shows that the situation is more complex than that.

automata pixel density as a function of 1 bits in rule

A cryptographically secure random number generator

A random number generator can have excellent statistical properties and yet not be suited for use in cryptography. I’ve written a few posts to demonstrate this. For example, this post shows how to discover the seed of an LCG random number generator.

This is not possible with a secure random number generator. Or more precisely, it is not practical. It may be theoretically possible, but doing so requires solving a problem currently believed to be extremely time-consuming. (Lots of weasel words here. That’s the nature of cryptography. Security often depends on the assumption that a problem is as hard to solve as experts generally believe it is.)

Blum Blum Shub algorithm

The Blum Blum Shub algorithm for generating random bits rests on the assumption that a certain number theory problem, the quadratic residuosity problem, is hard to solve. The algorithm is simple. Let M = pq where p and q are large primes, both congruent to 3 mod 4. Pick a seed x0 between 1 and M and relatively prime to M. Now for n > 0, set

xn+1 = xn² mod M

and return the least significant bit of xn+1. (Yes, that’s a lot of work for just one bit. If you don’t need cryptographic security, there are much faster random number generators.)

Python implementation

Here’s some Python code to illustrate using the generator. The code is intended to be clear, not efficient.

Given two large (not necessarily prime) numbers x and y, the code below finds primes p and q for the algorithm and checks that the seed is OK to use.

    import sympy

    # super secret large numbers
    x = 3*10**200
    y = 4*10**200
    seed = 5*10**300

    def next_usable_prime(x):
        # Find the next prime congruent to 3 mod 4 following x.
        p = sympy.nextprime(x)
        while (p % 4 != 3):
            p = sympy.nextprime(p)
        return p

    p = next_usable_prime(x)
    q = next_usable_prime(y)
    M = p*q

    assert(1 < seed < M)
    assert(seed % p != 0)
    assert(seed % q != 0)

There’s a little bit of a chicken-and-egg problem here: how do you pick x, y, and seed? Well, you could use a cryptographically secure random number generator ….

Now let’s generate a long string of bits:

# Number of random numbers to generate
N = 100000     

x = seed
bit_string = ""
for _ in range(N):
    x = x*x % M
    b = x % 2
    bit_string += str(b)


I did not test the output thoroughly; I didn’t use anything like DIEHARDER or PractRand as in previous posts, but just ran a couple simple tests described here.

First I look at the balance of 0’s and 1’s.

    Number of 1's: 50171
    Expected: 49683.7 to 50316.2

Then the longest run. (See this post for a discussion of expected run length.)

    Run length: 16
    Expected: 12.7 to 18.5

Nothing unusual here.

The Blums

The first two authors of Blum Blum Shub are Lenore and Manuel Blum. The third author is Michael Shub.

I had a chance to meet the Blums at the Heidelberg Laureate Forum in 2014. Manuel Blum gave a talk that year on mental cryptography that I blogged about here and followed up here. He and his wife Lenore were very pleasant to talk with.

Programming language life expectancy

The Lindy effect says that what’s been around the longest is likely to remain around the longest. It applies to creative artifacts, not living things. A puppy is likely to live longer than an elderly dog, but a book that has been in press for a century is likely to be in press for another century.

In a previous post I go into the mathematical detail of the Lindy effect: power law distributions etc. The key fact we need for this blog post is that if something has the kind of survival distribution described by the Lindy effect, then the expected future lifetime equals the current age. For example, the 100 year old book in the opening paragraph is expected to be around for another 100 years.

Note that this is all framed in terms of probability distributions. It’s not saying, for example, that everything new will go away soon. Everything was once new. Someone watching Hamlet on opening night would be right to speculate that nobody would care about Hamlet in a few years. But now that we know Hamlet has been around for four centuries and is going strong, the Lindy effect would predict that people will be performing Hamlet in the 25th century.

Note that Lindy takes nothing into account except survival to date. Someone might have been more bullish on Hamlet after opening night based on other information such as how well the play was received, but that’s beyond the scope of the Lindy effect.

If we apply the Lindy effect to programming languages, we only consider how long they’ve been around and whether they’re thriving today. You might think that Go, for example, will be along for a long time based on Google’s enormous influence, but the Lindy effect does not take such information into account.

So, if we assume the Lindy effect holds, here are the expected years when programming languages would die. (How exactly would you define the time of death for a programming language? Doesn’t really matter. This isn’t that precise or that serious.)

LanguageBornExpected death

You can debate what it even means for a language to survive. For example, I’d consider Lisp to be alive and well if in the future people are programming Clojure but not Common Lisp, for example, but others might disagree.

“We don’t know what language engineers will be coding in in the year 2100. However, we do know that it will be called FORTRAN.” — C.A.R. Hoare

Reverse engineering the seed of a linear congruential generator

The previous post gave an example of manipulating the seed of a random number generator to produce a desired result. This post will do something similar for a different generator.

A couple times I’ve used the following LCG (linear congruential random number generator) in examples. An LCG starts with an initial value of z and updates z at each step by multiplying by a constant a and taking the remainder by m. The particular LCG I’ve used has a = 742938285 and m = 231 – 1 = 2147483647.

(I used these parameters because they have maximum range, i.e. every positive integer less than m appears eventually, and because it was one of the LCGs recommended in an article years ago. That is, it’s very good as far as 32-bit LCGs go, which isn’t very far. An earlier post shows how it quickly fails the PractRand test suite.)

Let’s pick the seed so that the 100th output of the generator is today’s date in ISO form: 20170816.

We need to solve

a100z = 20170816 mod 2147483647.

By reducing  a100 mod 2147483647 we  find we need to solve

160159497 z = 20170816 mod 2147483647

and the solution is z = 1898888478. (See How to solve linear congruences.)

The following Python code verifies that the solution works.

    a = 742938285
    z = 1898888478
    m = 2**31 - 1

    for _ in range(100):
        z = a*z % m

Update: In this post, I kept n = 100 fixed and solved for the seed to give a specified output n steps later. In a follow up post I do the opposite, fixing the seed and solving for n.

Manipulating a random number generator

With some random number generators, it’s possible to select the seed carefully to manipulate the output. Sometimes this is easy to do. Sometimes it’s hard but doable. Sometimes it’s theoretically possible but practically impossible.

In my recent correspondence with Melissa O’Neill, she gave me an example that seeds a random number generator so that the 9th and 10th outputs produce the ASCII code for my name.

Here’s the code. The function next is the xoroshiro128+ (XOR/rotate/shift/rotate) random number generator. The global array s contains the state of the random number generator. Its initial values are the seeds of the generator.

#include <cstdio>
#include <cstdint>

// xoroshiro generator taken from

uint64_t s[2];

static inline uint64_t rotl(const uint64_t x, int k) {
	return (x << k) | (x >> (64 - k));

uint64_t next(void) {
	const uint64_t s0 = s[0];
	uint64_t s1 = s[1];
	const uint64_t result = s0 + s1;

	s1 ^= s0;
	s[0] = rotl(s0, 55) ^ s1 ^ (s1 << 14); // a, b
	s[1] = rotl(s1, 36); // c

	return result;

int main() {
    freopen(NULL, "wb", stdout); 

    s[0] = 0x084f31240ed2ec3f;
    s[1] = 0x0aa0d69470975eb8;

    while (1) {
        uint64_t value = next();
        fwrite((void*) &value, sizeof(value), 1, stdout);

Compile this code then look at a hex dump of the first few outputs. Here’s what you get:

cook@mac> gcc xoroshiro.cpp
cook@mac> ./a.out | hexdump -C | head
f7 4a 6a 7f b8 07 f0 12  f8 96 e1 af 29 08 e3 c8  |.Jj.........)...|
15 0e b0 38 01 ef b2 a7  bb e9 6f 4d 77 55 d7 a0  |...8......oMwU..|
49 08 f2 fc 0c b2 ea e8  48 c2 89 1b 31 3f d7 3d  |I.......H...1?.=|
11 eb bc 5e ee 98 b6 3b  d9 d1 cc 15 ae 00 fc 2f  |...^...;......./|
3e 20 4a 6f 68 6e 20 44  2e 20 43 6f 6f 6b 20 3c  |> John D. Cook <| 
d1 80 49 27 3e 25 c2 4b  2a e3 78 71 9c 9e f7 18  |..I'>%.K*.xq....|
0b bb 1f c6 1c 71 79 29  d6 45 81 47 3b 88 4f 42  |.....qy).E.G;.OB|
7c 1c 19 c4 22 58 51 2d  d7 74 69 ac 36 6f e0 3f  ||..."XQ-.ti.6o.?|
78 7f a4 14 1c bc a8 de  17 e3 f7 d8 0c de 2c ea  |x.............,.|
a2 37 83 f9 38 e4 14 77  e3 6a c8 52 d1 0c 29 01  |.7..8..w.j.R..).|

(I cut out the line numbers on the left side to make the output fit better horizontally on the page.)

Not only did one pair of seed values put my name in the output, another pair would work too. If you change the seed values to

s[0] = 0x820e8a6c1baf5b13;
s[1] = 0x5c51f1c4e2e64253;

you’ll also see my name in the output:

66 9d 95 fe 30 7c 60 de 7c 89 0c 6f cd d6 05 1e |f...0|`.|..o....|
2b e9 9c cc cd 3d c5 ec 3f e3 88 6c a6 cd 78 84 |+....=..?..l..x.|
20 12 ac f2 2b 3c 89 73 60 09 8d c3 85 68 9e eb | ...+<.s`....h..|
15 3e c1 0b 07 68 63 e5 73 a7 a8 f2 4b 8b dd d0 |.>...hc.s...K...|
3e 20 4a 6f 68 6e 20 44 2e 20 43 6f 6f 6b 20 3c |> John D. Cook <|
3f 71 cf d7 5f a6 ab cf 9c 81 93 d1 3d 4c 9e 41 |?q.._.......=L.A|
0d b5 48 9c aa fc 84 d8 c6 64 1d c4 91 79 b4 f8 |..H......d...y..|
0c ac 35 48 fd 38 01 dd cc a4 e4 43 c6 5b e8 c7 |..5H.8.....C.[..|
e1 2e 76 30 0f 9d 41 ff b0 99 84 8d c1 72 5a 91 |..v0..A......rZ.|
06 ea f2 8e d7 87 e5 2c 53 a9 0c a0 f4 cd d1 9b |.......,S.......|

Note that there are limits to how much you can manipulate the output of an RNG. In this case the seed was selected to produce two particular output values, but the rest of the sequence behaves as if it were random. (See Random is as random does.)

Selecting things in Emacs

You can select blocks of text in Emacs just as you would in most other environments. You could, for example, drag your mouse over a region. You could also hold down the Shift key and use arrow keys. But Emacs also has a number of commands that let you work in larger semantic units. That is, instead of working with an undifferentiated set of characters, you can select meaningful chunks of text, the meaning depending on context.

When you’re editing English prose, the semantic units you are concerned with might be words, sentences, or paragraphs. When you’re editing programming language source code, you care about functions or various “balanced expressions” such as the content between two parentheses or two curly brackets.

The following table gives some of the selection commands built into Emacs.

UnitCommandKey binding
pagemark-pageC-x C-p
buffermark-whole-buffer C-x h
balanced expressionmark-sexpC-M-@

The expand-region package offers an alternative to several of these commands. More on that later.

The command for selecting a word does just what you expect. Likewise, the commands for selecting a page or a buffer require little explanation. But the meaning of a “paragraph” depends on context (i.e. editing mode), as do the meanings of “function” and “balanced expression.”

When editing source code, a “paragraph” is typically a block of code without blank lines. However, each language implements its own editing mode and could interpret editing units differently. Function definition syntax varies across languages, so mark-defun has to be implemented differently in each language mode.

Balanced expressions could have a different meanings in different contexts, but they’re fairly consistent. Content between matching delimiters—quotation marks, parentheses, square brackets, curly braces, etc.—is generally considered a balanced expression.

Here’s where expand-region comes in. It’s typically bound to C-=. It can be used as a substitute for mark-word and mark-sexp. And if you use it repeatedly, it can replace mark-defun.

Each time you call expand-region it takes in more context. For example, suppose you’re in text mode with your cursor is in the middle of a word. The first call to expand-region selects to the end of the word. The second call selects the whole word, i.e. expanding backward to the beginning. The next call selects the enclosing sentence and the next call the enclosing paragraph.

The expand-region function works analogously when editing source code. Suppose you’re editing the bit of Emacs Lisp below and have your cursor on the slash between eshell and pwd.

(setq eshell-prompt-function
  (lambda nil
     " $ ")))

Here’s what sequential invocations of expand-region will select.

  1. /pwd
  2. /pwd/)
  3. (eshell/pwd)
  4. (eshell/pwd) " $ ")
  5. (concat (eshell/pwd) " $ ")
  6. (concat (eshell/pwd) " $ "))
  7. (lambda nil (concat (eshell/pwd) " $ "))
  8. (lambda nil (concat (eshell/pwd) " $ ")))
  9. (setq eshell-prompt-function (lambda nil (concat (eshell/pwd) " $ ")))

This is kinda tedious in this particular context because there are a lot of delimiters in a small region. In less dense code you’ll select larger blocks of code with each invocation of expand-region. Since each invocation requires only a single key (i.e. hold down Control and repeatedly type =) it’s easy to call expand-region over and over until you select the region you’d like.

Related posts:

DIEHARDER random number generator test results for PCG and MWC

A few days ago I wrote about testing the PCG random number generator using the DIEHARDER test suite. In this post I’ll go into a little more background on this random number generator test suite. I’ll also show that like M. E. O’Neill’s PCG (“permuted congruential generator”), George Marsaglia’s MWC (“multiply with carry”) generator does quite well.

This is not to say that MWC is the best generator for every purpose, but any shortcomings of MWC are not apparent from DIEHARDER. The PCG family of generators, for example, is apparently superior to MWC, but you couldn’t necessarily conclude that from these tests.

Unless your application demands more of a random number generator than these tests do, MWC is probably adequate for your application. I wouldn’t recommend it for cryptography or for high-dimensional integration by darts, but it would be fine for many common applications.

DIEHARDER test suite

George Marsaglia developed the DIEHARD battery of tests in 1995. Physics professor Robert G. Brown later refined and extended Marsaglia’s original test suite to create the DIEHARDER suite. (The name of Marsaglia’s battery of tests was a pun on the Diehard car battery. Brown continued the pun tradition by naming his suite after Die Harder, the sequel to the movie Die Hard.) The DIEHARDER suite is open source. It is designed to be at least as rigorous as the original DIEHARD suite and in some cases more rigorous.

There are 31 distinct kinds of tests in the DIEHARDER suite, but some of these are run multiple times. In total, 114 tests are run.

Marsaglia’s MWC

The main strength of Marsaglia’s MWC algorithm is that it is very short. The heart of the code is only three lines:

    m_z = 36969 * (m_z & 65535) + (m_z >> 16);
    m_w = 18000 * (m_w & 65535) + (m_w >> 16);
    return (m_z << 16) + m_w;

You can find a full implementation of a random number generator class based in MWC here.

The heart of PCG is also very short, but a bit more mysterious.

    rng->state = oldstate * 6364136223846793005ULL + (rng->inc | 1);
    uint32_t xorshifted = ((oldstate >> 18u) ^ oldstate) >> 27u;
    uint32_t rot = oldstate >> 59u;
    return (xorshifted >> rot) | (xorshifted << ((-rot) & 31));

(These are the 64-bit state versions of MWC and PCG. Both have versions based on larger state.)

Because these generators require little code, they’d be relatively easy to step into with a debugger, compared to other RNGs such as Mersenne Twister that require more code and more state.

Test results

Out of the 114 DIEHARDER tests run on MWC, all but three returned a pass, and the rest returned a weak pass.

A few weak passes are to be expected. The difference between pass, weak pass, and failure is whether a p-value falls below a certain threshold. Theory says that ideally p-values would uniformly distributed, and so one would fall outside the region for a strong pass now and then.

Rather than counting strong and weak passes, let’s look at the p-values themselves. We’d expect these to be uniformly distributed. Let’s see if they are.

Here are the p-values reported by the DIEHARDER tests for MWC:

Histogram of p-values for MWC

Here are the corresponding values for PCG:

Histogram of p-values for PCG

Neither test has too many small p-values, no more than we’d expect. This is normally what we’re concerned about. Too many small p-values would indicate that the generated samples are showing behavior that would be rare for truly random input.

But both sets of test results have a surprising number of large p-values. Not sure what to make of that. I suspect it says more about the DIEHARDER test suite than the random number generators being tested.

Update: I went back to look at some results from Mersenne Twister to see if this pattern of large p-values persists there. It does, and in fact the p-values are even more biased toward the high end for Mersenne Twister.

Histogram of Mersenne Twister p-values

One thing I noticed is that the large p-values are disproportionately coming from some of the same tests each time. In particular, the repetitions of thests_serial test have an unexpectedly high number of large p-values for each generator.

Testing the PCG random number generator

M. E. O’Neill’s PCG family of random number generators looks very promising. It appears to have excellent statistical and cryptographic properties. And it takes remarkably little code to implement. (PCG stands for Permuted Congruential Generator.)

The journal article announcing PCG gives the results of testing it with the TestU01 test suite. I wanted to try it out by testing it with the DIEHARDER test suite (Robert G. Brown’s extension of George Marsaglia’s DIEHARD test suite) and the NIST Statistical Test Suite. I used what the generator’s website calls the “minimal C implementation.”

The preprint of the journal article is dated 2015 but apparently hasn’t been published yet.

Update: See the very informative note by the author of PCG in the comments below.

For the NIST test suite, I generated 10,000,000 bits and divided them into 10 streams.

For the DIEHARDER test suite, I generated 800,000,000 unsigned 32-bit integers. (DIEHARDER requires a lot of random numbers as input.)

For both test suites I used the seed (state) 20170707105851 and sequence constant (inc) 42.

The PCG generator did well on all the NIST tests. For every test, at least least 9 out of 10 streams passed. The test authors say you should expect at least 8 out of 10 streams to pass.

Here’s an excerpt from the results. You can find the full results here.

  2   0   2        0  0.213309     10/10   Frequency
  0   0   1        3  0.534146     10/10   BlockFrequency
  3   0   0        0  0.350485     10/10   CumulativeSums
  1   1   0        2  0.350485     10/10   CumulativeSums
  0   2   2        1  0.911413     10/10   Runs
  0   0   1        1  0.534146     10/10   LongestRun
  0   1   2        0  0.739918     10/10   Rank
  0   4   0        0  0.122325     10/10   FFT
  1   0   0        1  0.000439     10/10   NonOverlappingTemplate
  2   1   0        0  0.350485      9/10   NonOverlappingTemplate
  0   2   1        0  0.739918     10/10   OverlappingTemplate
  1   1   0        2  0.911413     10/10   Universal
  1   1   0        0  0.017912     10/10   ApproximateEntropy
  1   0   1        1     ----       3/4    RandomExcursions
  0   0   0        1     ----       4/4    RandomExcursions
  2   0   0        0     ----       4/4    RandomExcursionsVariant
  0   0   3        0     ----       4/4    RandomExcursionsVariant
  1   2   3        0  0.350485      9/10   Serial
  1   1   1        0  0.739918     10/10   Serial
  1   2   0        0  0.911413     10/10   LinearComplexity


The DIEHARDER suite has 31 kinds tests, some of which are run many times, making a total of 114 tests. Out of the 114 tests, two returned a weak pass for the PCG input and all the rest passed. A few weak passes are to be expected from running so many tests and so this isn’t a strike against the generator. In fact, it might be suspicious if no tests returned a weak pass.

Here’s an edited version of the results. The full results are here.

        test_name   |ntup| tsamples |psamples|  p-value |Assessment
   diehard_birthdays|   0|       100|     100|0.46682782|  PASSED
      diehard_operm5|   0|   1000000|     100|0.83602120|  PASSED
  diehard_rank_32x32|   0|     40000|     100|0.11092547|  PASSED
    diehard_rank_6x8|   0|    100000|     100|0.78938803|  PASSED
   diehard_bitstream|   0|   2097152|     100|0.81624396|  PASSED
        diehard_opso|   0|   2097152|     100|0.95589325|  PASSED
        diehard_oqso|   0|   2097152|     100|0.86171368|  PASSED
         diehard_dna|   0|   2097152|     100|0.24812341|  PASSED
diehard_count_1s_str|   0|    256000|     100|0.75417270|  PASSED
diehard_count_1s_byt|   0|    256000|     100|0.25725000|  PASSED
 diehard_parking_lot|   0|     12000|     100|0.59288414|  PASSED
    diehard_2dsphere|   2|      8000|     100|0.79652706|  PASSED
    diehard_3dsphere|   3|      4000|     100|0.14978100|  PASSED
     diehard_squeeze|   0|    100000|     100|0.35356584|  PASSED
        diehard_sums|   0|       100|     100|0.04522121|  PASSED
        diehard_runs|   0|    100000|     100|0.39739835|  PASSED
        diehard_runs|   0|    100000|     100|0.99128296|  PASSED
       diehard_craps|   0|    200000|     100|0.64934221|  PASSED
       diehard_craps|   0|    200000|     100|0.27352733|  PASSED
 marsaglia_tsang_gcd|   0|  10000000|     100|0.10570816|  PASSED
 marsaglia_tsang_gcd|   0|  10000000|     100|0.00267789|   WEAK
         sts_monobit|   1|    100000|     100|0.98166534|  PASSED
            sts_runs|   2|    100000|     100|0.05017630|  PASSED
          sts_serial|   1|    100000|     100|0.95153782|  PASSED
          sts_serial|  16|    100000|     100|0.59342390|  PASSED
         rgb_bitdist|   1|    100000|     100|0.50763759|  PASSED
         rgb_bitdist|  12|    100000|     100|0.98576422|  PASSED
rgb_minimum_distance|   2|     10000|    1000|0.23378443|  PASSED
rgb_minimum_distance|   5|     10000|    1000|0.13215367|  PASSED
    rgb_permutations|   2|    100000|     100|0.54142546|  PASSED
    rgb_permutations|   5|    100000|     100|0.96040216|  PASSED
      rgb_lagged_sum|   0|   1000000|     100|0.66587166|  PASSED
      rgb_lagged_sum|  31|   1000000|     100|0.00183752|   WEAK
      rgb_lagged_sum|  32|   1000000|     100|0.13582393|  PASSED
     rgb_kstest_test|   0|     10000|    1000|0.74708548|  PASSED
     dab_bytedistrib|   0|  51200000|       1|0.30789191|  PASSED
             dab_dct| 256|     50000|       1|0.89665788|  PASSED
        dab_filltree|  32|  15000000|       1|0.67278231|  PASSED
        dab_filltree|  32|  15000000|       1|0.35348003|  PASSED
       dab_filltree2|   0|   5000000|       1|0.18749029|  PASSED
       dab_filltree2|   1|   5000000|       1|0.92600020|  PASSED