Looking back at Martin Gardner’s RSA article

Public key cryptography came to the world’s attention via Martin Gardner’s Scientific American article from August 1977 on RSA encryption.

The article’s opening paragraph illustrates what a different world 1977 was in regard to computation and communication.

… in a few decades … the transfer of information will probably be much faster and much cheaper by “electronic mail” than by conventional postal systems.

Gardner quotes Ron Rivest [1] saying that breaking RSA encryption by factoring the product of two 63-digit primes would take about 40 quadrillion years. The article included a challenge, a message encrypted using a 129-digit key, the product of a 64-digit prime and a 65-digit prime. Rivest offered a $100 prize for decrypting the message.

Note the tension between Rivest’s estimate and his bet. It’s as if he were saying “Based on the factoring algorithms and computational hardware now available, it would take forever to decrypt this message. But I’m only willing to bet $100 that that estimate remains valid for long.”

The message was decrypted 16 years later. Unbeknownst to Gardner’s readers in 1977, the challenge message was

THE MAGIC WORDS ARE SQUEAMISH OSSIFRAGE

encoded using 00 for space, 01 for A, 02 for B, etc.  It was decrypted in 1993 by a group of around 600 people using around 1600 computers. Here is a paper describing the effort. In 2015 Nat McHugh factored the key in 47 minutes using 8 CPUs on Google Cloud.

The RSA algorithm presented in Gardner’s article is much simpler than it’s current implementation, though the core idea remains unchanged. Now we use much larger public keys, the product of two 1024 bit (308 digit) primes or larger. Also, RSA isn’t used to encrypt messages per se; RSA is used to exchange symmetric encryption keys, such as AES keys, which are then used to encrypt messages.

RSA is still widely used, though elliptic curve cryptography (ECC) is taking its place, and eventually both RSA and ECC will presumably be replaced with post-quantum methods.

More RSA posts

[1] I met Ron Rivest at the Heidelberg Laureate Forum in 2013. When he introduced himself I said something like “So you’re the ‘R’ in RSA?” He’s probably tired of hearing that, but if so he was too gracious to show it.

Factoring RSA100

Earlier today I wrote about factoring four 255-bit numbers that I needed for a post. Just out of curiosity, I wanted to see how long it would take to factor RSA 100, the smallest of the factoring challenges posed by RSA Laboratories in 1991. This is a 100-digit (330-bit) number that is the product of two primes.

I used the CADO-NFS software. The software was developed in France, and CADO is a French acronym for Crible Algébrique: Distribution, Optimisation. NFS stands for number field sieve, the fastest algorithm for factoring numbers with over 100 digits.

RSA 100 was first factored in 1991 using a few days of compute time on an MP1 MasPar computer, a machine that cost $500,000 at the time, equivalent to around $1,250,000 today.

My effort took about 23 minutes (1376 seconds) on a System 76 Meerkat mini that I paid $600 for in 2022.

The MP1 was about the size of a refrigerator. The Meerkat is about 3″ × 3″ × 1.5″.

Most legible font for WIF

Bitcoin’s Wallet Import Format (WIF) is essentially Base58 encoding with a checksum. (See the next post for details.) It is meant to a human-friendly way to display cryptographic private keys. It’s not that friendly, but it could be worse.

The checksum (the first four bytes of SHA256 applied twice) is appended before the conversion to Base58, so the final result consists of only Base58 characters.

The Base58 alphabet is

    123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz

and so some easily-confused characters have been removed. For example, the lower case letter o is included but the upper case O and the numeral 0 are not included. The lower case letter l has been removed so that it won’t be confused with the numeral 1.

But there are still a few letters that could be confused:

    1ij 2Zz Cc Kk 5Ss Uu Vv Ww Xx Yy Zz

I was curious what font might make these letters the most distinct, and the best I found was IBM Plex Mono Italic.

Similar letters in IBM Plex Mono italic

The pairs Cc and Ss are still similar, but the rest of the upper and lower case pairs are distinct. (Note the serif on the lower case u, for example.)

Without the italic, lower case v, x, and z are simply smaller versions of their upper case counterparts.

Here’s the whole Base58 alphabet in IBM Plex Mono italic. Note the “holes” in the alphabet where some letters were removed.

Related posts

Retrofitting error detection

Bitcoin addresses include a checksum. The Base58Check algorithm uses the first four bytes of a hash function as a checksum before applying Base58 encoding.

Ethereum addresses did not include a checksum, but it became apparent later that a checksum was needed. How can you retroactively fit a checksum into an address without breaking backward compatibility? Here’s what Ethereum did in adopting the EIP-55 proposal.

Ethereum addresses are the last 20 bytes (40 hexadecimal characters) of the Keccak-256 hash of the public key. The protocol allowed the letters in hexadecimal addresses to be either upper or lower case. This option provided the wiggle room to retrofit a checksum. You could mix upper and lower case letters deliberately to encode an extra message (i.e. a checksum) on top of the key. This is sorta like steganography, except that it’s out in the open rather than hidden.

Hexadecimal notation uses 10 digits and 6 letters, and so the probability that a hexadecimal character is a letter is 6/16. On average a string of 40 hexadecimal characters will contain 15 letters. This means you could add 15 bits of information to an Ethereum address, on average, by alternating the case of the letters.

It’s conceivable that an address won’t contain any letters, or will consist entirely as letters. The number of letters in a random string of 40 hexadecimal characters is a binomial random variable with parameters n = 40 and p = 3/8, and this is approximately a normal random variable with mean 15 and standard deviation 3. This says the number of letters in an Ethereum address will be fairly tightly clustered around 15, rarely more than 21 or less than 9.

OK, but how do you take advantage of an average of 15 bits of freedom? You can’t encode a 15-bit number because you might not have 15 bits at your disposal.

Here’s what EIP-55 did. Left-align the address and the Keccek-256 hash of the address (which was itself a hash: there are two hashes going on) and capitalize all letters in the address that align with a character in the hash greater than or equal to 8.

As an example, let’s suppose our address is

    7341e5e972fc677286384f802f8ef42a5ec5f03b

This address contains 13 letters, which would not be unusual. Now let’s compute its hash.

    >>> from Crypto.Hash import keccak
    >>> kh = keccak.new(digest_bits=256)
    >>> kh.update(b"7341e5e972fc677286384f802f8ef42a5ec5f03b").hexdigest()
    'd8e8fcb225fb835fdb89a5918e736820ec75b3efaf3cbb229f03cdc41bf9f254'

Now we line up the address and its hash.

    341e5e972fc677286384f802f8ef42a5ec5f03b
    d8e8fcb225fb835fdb89a5918e736820ec75b3e...

The characters in the address that line up with a hash character of 0x8 through 0xf are highlighted red. The digits will be left alone, but the red letters will be turned to upper case.

    341E5E972fc677286384F802F8ef42a5EC5f03B

Related posts

Base58 versus Base85 encoding

Base58 encoding and Base85 encoding are used to represent binary data in a human-friendly way. Base58 uses a smaller character set and so is more conservative. Base85 uses a larger character set and so is more efficient.

There is a gotcha in that “base” means something different in Base58 compared to Base85. More on that below.

Base58

Base58 encoding is primarily used as part of the Bitcoin system. It is part of the Base58Check protocol used for encoding addresses and keys.

Base58 encoding is essentially the same as mathematical base 58 encoding, with a specific character set. The symbols for the “digits” 0 through 57 are chosen to avoid typographically similar letters. We’ll give that character set in the examples below.

There is only one version of Base58 in common use as far as I know, unlike Base85.

Base85

Base85 is a more compact alternative to Base64 encoding. The former encodes 4 bytes in 5 characters while the latter requires 6 characters. Base85 is used inside the PDF format. It is also used in the patch encoding for git.

Base85 encoding is analogous to binary-coded decimal (BCD). In some early computer systems, integers would not be expressed in binary per se. Instead, each digit would be represented as by four bits. So to represent a number like 427, you’d express 4, 2, and 7 in binary: 0100 0010 0111. If you were to express 427 in binary you’d get 110101011.

Base85 breaks bits into 32-bit words, then expresses each word in base 85. So you might say it’s base 85-encoded 32-bit words by analogy to binary coded decimal.

There are variations on Base85 encoding that use different alphabets, and so two software packages that say they do Base85 encoding might produce different results.

Base85 is more efficient than Base58 in the sense that it represents data using fewer symbols. It is also more computationally efficient because each 32-bit word is encoded independently.

Examples

We give four examples below: Base58 and Base85 applied to four bytes of data and eight bytes of data. The data length matters for Base85.

Base58, four bytes

Let n = CAFEBABEhex = 3405691582ten. This is the “magic number” at the beginning of Java class files, a pun on “java” as a slang for coffee.

In base 58 this number would be

5:10:55:3:26:22

We can verify this as follows:

    >>> 5*58**5 + 10*58**4 + 55*58**3 + 3*58**2 + 26*58 + 22
    3405691582
    >>>  hex(_)
    '0xcafebabe'

The Base58 alphabet is

    123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz

and so the Base58 encoding of 0xCAFEBABE would be the 5th, 10th, 55th, … elements of this alphabet (with zero-based index) which results in 6Bx4TP.

Note that the Base58 alphabet contains the digit 1 but not the letter l. It contains the lower case letter o but not the capital letter 0 or the digit 0. Some of the remaining characters are visibly similar, depending on your font. This post shows how one font makes the Base58 characters more distinct.

Base85, four bytes

Now suppose we want to encode n using Base85. Now we would get

65:20:50:84:67

If we use the alphabet

    !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstu

then the “digits” above become b5Sud.

Note that the Base85 alphabet contains characters that could be confused, such as 0 (zero), O (capital letter), o (lower case letter). The characters were chosen to be printable ASCII characters, not necessarily visually distinct.

Base58, eight bytes

Now suppose n = CAFEBABECAFEBABEhex = 14627333968358193854ten.

We convert n to base 58 to get

33:55:17:43:49:44:3:47:49:44:26

which becomes axJkrm4prmT in the Base58 alphabet.

Base85, eight bytes

To encode CAFEBABECAFEBABEhex in Base85 we do not convert the number to base 85. Instead, we convert each 4-byte word to base 85. So we get two copies of CAFEBABEhex and so the encoding is b5Sudb5Sud.

If we were to wrongly convert n to base 85, we’d get

63:13:1:27:77:35:57:62:38:49

which becomes `."<nDZ_GR which is not the correct encoding.

Related posts

Experiences with Nvidia

Our team started working within Nvidia in early 2009 at the beginning of the ORNL Titan project. Our Nvidia contacts dealt with applications, libraries, programming environment and performance optimization. First impressions were that their technical stance on issues was very reasonable. One obscure example: in a C++ CUDA kernel were you allowed to use “enums,” and the answer would be, of course, yes, we would allow that. This was unlike some other companies that might have odd and cumbersome programming restrictions in their parallel programming models (though by now this has become a harder problem for Nvidia since there are so many software products a user might want to interoperate).

Another example, with a colleague at Nvidia on the C++ standards committee, to whom I mentioned, it might be too early to lock this certain feature design into the standard since hardware designs are still rapidly changing. His response was, Oh, yes, we think exactly the same thing. So in short, their software judgments and decisions generally seem to be well thought out, reasonable and well informed. It sounds simple, but it is amazing how many companies have gotten this wrong.

Nvidia has made good strategic decisions. In the 2013 time frame, Intel was becoming a competitive threat with the Xeon Phi processor. Intel was several times larger than Nvidia with huge market dominance. In response, Nvidia formed a partnership with IBM–itself several times larger than Intel at the time. This came to fruition in the ORNL Summit system in 2018. In the meantime, the Xeon Phi’s OpenMP programming model, though standards-based, turned out to be difficult to write optimized code for, and Nvidia CUDA captured market share dominance of accelerated user software. Intel eventually dropped the Xeon Phi product line.

In the early 2000s, Nvidia went all-in on CUDA. I’ve heard some project teams say they would never use CUDA, because it is nonstandard and too low-level. Many have turned back on this decision. Of course, it is often possible to write an abstraction layer on top of CUDA to make it easier to use and maintain. Also newer programming models like Kokkos can be helpful.

Nvidia also made a prescient decision early to bet big on AI. A little later they decided to go all in on developing a huge number of software libraries is to enable access to many new markets. A huge moat. AMD is trying hard to improve their software processes and catch up.

On the downside, Nvidia high prices are upsetting to many, from gamers to buyers of the world’s largest HPC systems. Competition from AMD and others is a good thing.

And Nvidia marketing speak is sometimes confusing. A comparison was once made claiming that a four GPU system was more powerful than one of the world’s top CPU-only supercomputers on a very specific science problem. I’d like to see the details of that comparison. Also, different figures are being given on how long it took to stand up xAI’s Colossus supercomputer, from 19 days to 122 days. One has to dig a little to find out what these figures mean. Also it was widely reported last year that the GB200 NVL72 GPU was “30 times faster” than H100, but this only refers to certain operations, not key performance measures like flops per watt.

Those are my takes. For more perspectives, see Tae Kim’s excellent book, The Nvidia Way, or this interview.

Thoughts on Nvidia? Please leave in the comments.

 

Most ints are not floats

All integers are real numbers, but most computer representations of integers do not equal computer representations of real numbers.

To make the statement above precise, we have to be more specific about what we mean by computer integers and floating point numbers. I’ll use int32 and int64 to refer to 32-bit and 64-bit signed integers. I’ll use float32 and float64 to refer to IEEE 754 single precision and double precision numbers, what C calls float and double.

Most int32 numbers cannot be represented exactly by a float32. All int32 numbers can be represented approximately as float32 numbers, but usually not exactly. The previous statements remain true if you replace “32” everywhere by “64.”

32 bit details

The int32 data type represents integers −231 through 231 − 1. The float32 data type represents numbers of the form

± 1.f × 2e

where 1 bit represents the sign, 23 bits represent f, and 8 bits represent e.

The number n = 224 can be represented by setting the fractional part f  to 0 and setting the exponent e to 24. But the number n + 1 cannot be represented as a float32 because its last bit would fall off the end of f:

224 + 1 = (1 + 2−24) 224 = 1.000000000000000000000001two × 224

The bits in f fill up representing the 23 zeros after the decimal place. There’s no 24th bit to store the final 1.

For each value of e, there are 223 possible values of f. So for each of e = 24, 25, 26, …, 31 there are 223 representable integers, for a total of 8 × 223.

So of the 231 non-negative integers that can be represented by an int32 data type, only 9 × 223 can also be represented exactly as a float32 data type. This means about 3.5% of 32-bit integers can be represented exactly by a 32-bit float.

64 bit details

The calculations for 64-bit integers and 64-bit floating point numbers are analogous. Numbers represented by float64 data types have the form

± 1.f × 2e

where now f has 52 bits and e has 11.

Of the 263 non-negative integers that can be represented by an int64 data type, only 11 × 252 can also be represented exactly as a float64 data type. This means about 0.5% of 64-bit integers can be represented exactly by a 64-bit float.

A note on Python

Python’s integers have unlimited range, while its floating point numbers correspond to float64. So there are two reasons an integer might not be representable as a float: it may be larger than the largest float, or it may require more than 53 bits of precision.

Related posts

Test whether a large integer is a square

Years ago I wrote about a fast way to test whether an integer n is a square. The algorithm rules out a few numbers that cannot be squares based on their last (hexadecimal) digit. If the the integer passes through this initial screening, the algorithm takes the square root of n as a floating point number, rounds  to an integer, and tests whether the square of the rest equals n.

What’s wrong with the old algorithm

The algorithm described above works fine if n is not too large. However, it implicitly assumes that the integer can be represented exactly as a floating point number. This is two assumptions: (1) it’s not too large to represent, and (2) the representation is exact.

A standard 64-bit floating point number has 1 bit for the sign, 11 bits for the exponent, and 52 bits for the significand. (More on that here.) Numbers will overflow if they run out of exponent bits, and they’ll lose precision if they run out of significand bits. So, for example, 21024 will overflow [1]. And although 253 + 1 will not overflow, it cannot be represented exactly.

These are illustrated in Python below.

    >>> float(2**1024)
    OverflowError: int too large to convert to float
    >>> n = 2**53
    >>> float(n) == float(n + 1)
    True

A better algorithm

The following algorithm [2] uses only integer operations, and so it will work exactly for arbitrarily large integers in Python. It’s a discrete analog of Newton’s square root algorithm.

    def intsqrt(N):
        n = N
        while n**2 > N:
            n = (n + N//n) // 2
        return n

    def issquare(N):
        n = intsqrt(N)
        return n**2 == N

The function issquare will test whether N is a square. The function intsqrt is broken out as a separate function because it is independently useful since it returns ⌊√N⌋.

The code above correctly handles examples that the original algorithm could not.

    >>> issquare(2**1024)
    True
    >>> issquare(2**54)
    True
    >>> issquare(2**54 + 1)
    False

You could speed up issquare by quickly returning False for numbers that cannot be squared on account of their final digits (in some base—maybe you’d like to use a base larger than 16) as in the original post.

[1] The largest allowed exponent is 1023 for reasons explained here.

[2] Bob Johnson. A Route to Roots. The Mathematical Gazette, Vol. 74, No. 469 (Oct., 1990), pp. 284–285

Why use hash puzzles for proof-of-work?

A couple days ago I wrote about the the problem that Bitcoin requires to be solved as proof-of-work. In a nutshell, you need to tweak a block of transactions until the SHA256 double hash of its header is below a target value [1]. Not all cryptocurrencies use proof of work, but those that do mostly use hash-based puzzles.

Other cryptocurrencies use a different hashing problem, but they still use hashes. Litecoin and Dogecoin use the same proof-of-work problem, similar to the one Bitcoin uses, but with the scrypt (pronounced S-crypt) hash function. Several cryptocurrencies use a hashing problem based on Equihash. Monero uses its RandomX algorithm for proof-of-work, and although this algorithm has multiple components, it ultimately solves a hashing problem. [2]

Why hash puzzles?

Why do cryptocurrencies use hashing problems for proof of work? In principle they could use any computational problem that is hard to solve but easy to verify, such as numerous problems in computational number theory.

One reason is that computer scientists are confident that quantum computing would not reduce the difficulty of solving hash puzzles, even though it would reduce the difficulty of factoring-based puzzles. Also, there is general agreement that it’s unlikely a mathematical breakthrough will find a weakness in hashing functions.

Ian Cassels said “Cryptography is a mixture of mathematics and muddle, and without the muddle the mathematics can be used against you.” Hashing is much more muddle than mathematics.

Why not do something useful?

Hash puzzles work well for demonstrating work done, but they’re otherwise useless. They keep the wheels of cryptocurrencies turning, but the solutions themselves are intrinsically worthless.

Wouldn’t it be nice if crypto miners were solving useful problems like protein folding? You could do that. In fact there is a cryptocurrency FoldingCoin that does just that. But FoldingCoin has a market cap seven orders of magnitude smaller than Bitcoin, on the order of $200,000 compared to Bitcoin’s market cap of $2T.

Cryptocurrencies that use proof of useful work have not taken off. This might necessarily be the case. Requiring practical work creates divergent incentives. If you base a currency on the difficulty of protein folding computations, for example, it would cause major disruption if a pharmaceutical company decided to get into mining protein folding cryptocurrency at a loss because it values the results.

Going back to Cassels’ remark about mathematics and muddle, practical real-world problems often have a mathematical structure. Which is a huge blessing, except when you’re designing problems to be hard. Hash-based problems have gradually become easier to solve over time, and cryptocurrencies have adjusted. But a mathematical breakthrough for solving a practical problem would have the potential to disrupt a currency faster than the market could adapt.

Related posts

[1] You don’t change the transaction amounts, but you may change the order in which the transactions are arranged into a Merkle tree so that you get different hash values. You can also change a 32-bit nonce, and a timestamp, but most of the degrees of freedom you need in order to find an acceptable hash comes from rearranging the tree.

[2] Both scrypt and Equihash were designed to be memory intensive and to thwart the advantage custom ASIC mining hardware. However, people have found a way to use ASIC hardware to solve scrypt and Equihash problems. RandomX requires running a randomly generated problem before hashing the output in an attempt to frustrate efforts to develop specialized mining hardware.

What is the Bitcoin proof-of-work problem?

In order to prevent fraud, anyone wanting to add a block to the Bitcoin blockchain must prove that they’ve put in a certain amount of computational work. This post will focus on what problem must be solved in order produce proof of work.

You’ll see the proof of work function described as finding strings whose SHA256 hash value begins with a specified number of 0s. That’s sort of a zeroth-level approximation of the problem.

The string s you’re trying to find has the form data + nonce where the data comes from the block you’re wanting to add and the nonce is a value you concatenate on the end. You try different values until you get an acceptable hash value.

You’re not computing the SHA256(s) but rather the double hash:

SHA256²(s) = SHA256( (SHA256(s) )

The only way to find such a string s is by brute force [1], and applying the hash function twice doubles the amount of brute force work needed.

And you’re not exactly trying to produce leading zeros; you’re trying to produce a value less than a target T. This is roughly the same thing, but not quite.

To illustrate this, suppose you have a 2FA fob that generates six-digit random numbers. You’ve been asked to wait until your fob generates a number less than 2025. Waiting until you have three leading zeros would be sufficient, but that would be making the task harder than it needs to be. You’d be waiting for a number less than 1000 when you’re only asked to wait for a number less than 2025.

A SHA256 hash value is a 256-bit number. If your target T is a power of 2

T = 2256−n

then finding a value of s such that

SHA256²(s) < T

really is finding an s whose (double) hash begins with n zeros, though T is not required to be a power of 2.

Finding a value of s with

SHA256²(s) < 2256−n

will require, on average, testing 2n values of s.

The value of T is adjusted over time in order to keep the amount of necessary work roughly constant. As miners have become more efficient, the value of T has gone down and the amount of work has gone up. But the value of T can go up as well. It is currently fluctuating around 2176, i.e. hashes must have around 80 leading zero bits.

Now here’s where things get a little more complicated. I said at the beginning that the string s has the form

s = data + nonce

where the data comes from the block you’re trying to add and the nonce is a number you twiddle in order to get the desired hash value. But the nonce is a 32-bit integer. If you need to hash on the order of 280 strings in order to find one with 80 leading zeros, you can’t do that just by adjusting a 32-bit nonce.

In practice you’re going to have to twiddle the contents of what I’ve called data. The data contains a Merkle tree of transactions, and you can change the hash values by adjusting the order in which transactions are arranged in the tree, in addition to adjusting the nonce.

Related posts

[1] Unless someone finds a flaw in SHA256, which cryptographers have been trying to do for years and have not been able to do. And if a significant weakness is found in SHA256, it may not translate into a significant flaw in SHA256².