Encryption in groups of unknown order

One way of looking at RSA encryption, a way that generalizes to new methods, is that the method is based on group operations inside a group of unknown order, i.e. unknown to most people. Another way of putting it is that RSA encryption takes place in a group where everybody knows how to multiply but not everyone knows how to divide. This will be explained below.

RSA encryption starts by finding two large primes p and q. You compute the product

n = pq

and make it public, but keep the factors p and q secret. The RSA method lets anyone send you encrypted messages by doing certain operations in the multiplicative group of the integers mod n. (Details here.) Let’s call this group G.

The elements of G are the integers from 1 to n-1 that are relative prime to n. The group operation is multiplication mod n, i.e. to multiply two elements of G, multiply them first as ordinary integers, then divide the result by n and keep the remainder.

The order of G, the number of elements in G, is

φ(n) = (p – 1)(q – 1).

You know p and q, and so you know φ(n), but the public does not. The security of RSA depends on the assumption that the public cannot compute φ(n). If someone could factor n, they could compute φ(n), but it is assumed that for large enough p and q it is not computationally feasible to factor n.

The public knows how to multiply in G but not how to divide. That is, anyone can carry out multiplication, but they cannot compute multiplicative inverses. Only you know how to divide in G, because you know φ(n) and Euler’s theorem.

In some sense the public knows everything there is to know about G. It’s the multiplicative group of integers mod n, and you tell them what n is. And that does tell them all they need to know to send you messages, but in practice it doesn’t tell them enough to decrypt messages anyone else sends you.

When you’re using RSA for public key cryptography, you’re telling the world “Here’s an n. To communicate securely with me, carry out certain algorithms with the integers relatively prime to n.”

Someone might object “But how do we know whether an integer is relatively prime to n? You haven’t told us its factors.”

You could reply “Don’t worry about it. It’s extremely unlikely that you’d run into a number that isn’t relatively prime to n. In fact, if you did, you’d break my system wide open. But if you’re worried about it, you can efficiently confirm that your numbers are relatively prime to n.”

Let’s unpack that last statement. We’ve already said that the number of positive integers less and n and relatively prime to n is (p – 1)(q – 1). So the number that are not relatively prime is

pq – (p – 1)(q – 1) = p + q – 1

and the probability of accidentally running into a one of these numbers is

(p + q – 1)/pq

Now if p and q are 300 digit numbers, for example, then this probability is on the order of one chance in 10300.

The Euclidean algorithm lets you find the greatest common factor of enormous numbers quickly. If you have a number k and you want to test whether it’s relatively prime to n, you can compute gcd(k, n). If k was chosen at random, the gcd is extremely likely to be 1, i.e. relatively prime to n. But if the answer is not 1, then it’s either p or q, in which case you’ve broken the encryption.


If you can factor large numbers, you can break RSA encryption. But it’s conceivable that you may be able to break RSA without being able to factor large numbers. That is in fact the case when RSA is implemented poorly. But aside from implementation flaws, nobody knows whether breaking RSA is as hard as factoring. Michael Rabin came up with a variation on RSA that is provably as hard as factoring, though I don’t know whether it has ever been used in practice.

More cryptography posts here.

Also a crypto library

The home page for the OpenSSL project says

OpenSSL is a robust, commercial-grade, and full-featured toolkit for the Transport Layer Security (TLS) and Secure Sockets Layer (SSL) protocols. It is also a general-purpose cryptography library. …

If you’ve never heard of the project before, you would rightly suppose that OpenSSL implements SSL (and its successor TLS). But you might not realize that OpenSSL “is also a general-purpose cryptography library.”

After thinking about it a bit, you might realize that software implementing SSL must have some encryption capability, but it doesn’t follow that this capability would necessarily be readily accessible. In fact, OpenSSL has implements a lot of cryptography algorithms and makes them easy to use from the command line. For example, this post shows how to compute hash functions using the openssl command.

Earlier today I wrote a thread on @CompSciFact about the famous example of encrypting an image of the Linux mascot Tux using ECB (Electronic Code Book) mode. As the saying goes, you should never use ECB “because you can see the penguin.”

Original encrypted Tux image

I wanted to try reproducing the example, and my first thought was to use Python. But setting up encryption libraries is a fairly lengthy process, while AES encryption using openssl is a one-liner.

My encrypted Tux image

You can still see the outline of Tux, but my penguin looks quite different from the famous example for a variety of reasons. For starters, I don’t know what key was used in the original image. Also, there are a variety of ways to extract the data from an image, encrypt it, and put it back. I basically followed Filippo Valsorda’s post The ECB Penguin but I had to make a few changes to get it to work due to changes in GIMP since that post was written.

Using cryptography broken 50 years ago

Old cryptography never dies. After a method is broken, its use declines, but never goes to zero.

And when I say “broken,” I do not mean no longer recommended, but broken to the point of being trivial to decrypt. I recently ran across an anecdote from World War I showing this is nothing new. The Vigenère cipher had been broken decades before the war broke out but was widely used anyway.

In this post I’ll explain what the Vigenère cipher is, give a little history of its rise and fall, and explain how it can be broken.

Vigenère cipher

A simple substitution cipher replaces one letter with another. For example, maybe you replace A with X, B with J, C with B, etc. Simple substitution ciphers are so easy to break that they’re included in pulp puzzle books.

The Vigenère cipher is a step up from simple substitution. It combines a key of length n with the clear text in such a way that you effectively have n different simple substitution ciphers. It’s not hard to break—I’ll explain how to break it below—but it’s less vulnerable that simple substitution. It makes it harder to spot high-frequency letters like E because not all E’s are encrypted the same way.

In its simplest form Vigenère essentially adds a key and a message mod 26. For example, if your message is “Attack the bridge at dawn” and your key is “ossifrage” the encryption would work like this.


Starting from the left, O is the 14th letter of the alphabet (counting from 0), and so A is moved ahead 14 places to 0. Since the clear text and the key are involved symmetrically, you could say that the letter O is moved head zero places since A is the 0th letter of the alphabet.

The next two letters of the clear text and the key happen to be repeated [1]. T and S are the 19th and 18th letters of the alphabet, and 19 + 18 = 37, which is congruent to 11 mod 16. The 11th letter of the alphabet is L.

Here’s a little Python code to carry out the encryption.

    key   = "OSSIFRAGE"

    A = ord('A')
    for i in range(len(clear)):
        c = ord(clear[i]) - A
        k = ord(key[i % len(key)]) - A
        e = (c + k) % 26
        print(chr(e + A), end="")

A more secure version of Vigenère moves the clear text through scrambled alphabets. The simplified version above is a particularly insecure special case. However, using scrambled alphabets (“polyalphabetic substitution”) doesn’t make the method that much stronger.

World War I

According to [2],

[The Vigenère cipher] was commonly regarded as unbreakable and was widely used up through World War I, even though the Prussian cryptographer Friedrich Wilhelm Kasiski had published a method for breaking it in 1863.

David Kahn [3] gives more detail about Kasiski’s book:

Die Geheimschriften und die Dechiffrir-kunst concentrates on answering the problem that had vexed cryptanalysts for more than 300 years: how to achieve a general solution for polyalphabetic ciphers with repeating keywords. … But the 95-page volume seems to have stirred almost no comment at the time.

So armies were depending on the security of Vigenère over 50 years after it had been broken. This was worse than using DES today, around 50 years after it came out. Weaknesses have been found in DES, and it’s 56-bit keys are too short to resist a brute force attack from contemporary computers. But DES has not been broken as thoroughly as Vigenère had been broken by the time of WWI.

Breaking Vigenère

So how would you go about attacking Vigenère? At first it seems hard to break. You can’t naively look at letter frequencies. In the example above, the clear text has four A’s, but they are encrypted three different ways.

However, Vigenère with a key of length n is just a set of n different simple substitution ciphers. You can chop the cipher text into n pieces and break each one separately.

How would you know the key length n? You could just try brute force. Try 1, then 2, then 3, etc. For example, to see whether the example above might have a key of length 2, you could analyze the letters in even positions and the letters in odd positions separately. If n isn’t too large (and in practice it often wasn’t large) brute force is efficient.

A more sophisticated approach would be to try different alignments and see which one results in statistical properties most consistent with English text. (Or French text if you suspect your clear text was in French etc.) This was the motivation for William Friedman developing his index of coincidence.

In hindsight this seems fairly obvious, but it was not obvious to anyone for three centuries before Kasiski, nor to many for decades after.

Why study obsolete cryptography?

Classical encryption methods are completely obsolete. The methods described in David Kahn’s book The Codebreakers can now be broken almost instantly. So why study old cryptography?

My interest in the subject was renewed recently because I had use for it in new projects. Not directly, though I have seen obsolete encryption in use, but indirectly. Some of the ideas from classical cryptography are relevant to searching for patterns in data, even though nothing was encrypted.

There are still lessons to learn from classical cryptography. For example, even a subtle lack of randomness in keys can be exploited. This was done during WWII, and flaws in random number generators are still causing security failures today.

Related posts

[1] This is a little foreshadowing of what can go wrong even with non-repeating keys: If the clear text and the key are English prose, coincidences like this will happen, and they are an exploitable weakness.

[2] James S. Craft and Lawrence C. Washington. An Introduction to Number Theory with Cryptography, 2nd edition. CRC Press.

[3] David Kahn. The Codebreakers: The Story of Secret Writing. Scribner, 1967.

Index of coincidence

Index of coincidence is a statistic developed by William Friedman for use in cryptanalysis. It measures how unevenly symbols are distributed in a message.

It’s a kind of signature that could be used, for example, to infer the language of a text, even if the text has been encrypted with a simple substitution cipher. It also has more sophisticated uses, such as inferring key length in text that has been encrypted with a Vigenère cipher.

If a discrete random variable has n possible values, each occurring with probability pi, then its index of coincidence is

I = \sum_{i=1}^n p_i^2

In cryptanalysis, the probabilities are the letter frequencies in the language of the message. The sum above is often multiplied by some constant, but this makes no difference in application because you would compare index of coincidence values, not interpret the index in the abstract.

Often the sum above is multiplied by n. This amounts to dividing the sum above by the corresponding sum for a uniform distribution.

Index of coincidence is occasionally use in applications, though not in cryptography anymore.


The letter frequencies in Google’s English language corpus are given here. In this corpus the probabilities range from 0.1249 for e down to 0.0009 for z. The index of coincidence would be

0.1249² + 0.0928² + … + 0.0009² = 0.06612

You’re more likely to see this value multiplied by 26, which gives 1.719.

Relation to entropy

Index of coincidence seems a lot like entropy, and in fact it is a simple transformation of an entropy, though not Shannon entropy. Index of coincidence is closely related to Rényi entropy.

Rényi entropy of order α is defined for a random variable X to be

H_\alpha(X) = \frac{1}{1 - \alpha} \log_2 \left(\sum_{i=1}^n p_i^\alpha \right)

and so Rényi entropy of order 2 is the negative log of the index of coincidence. That is, if H is the Rényi entropy of order 2 and I is the index of coincidence, then

\begin{align*} H &= -\log I \\ I &= \exp(-H) \end{align*}

Index of coicidence and Rényi entropy are sometimes defined with multiplicative constants that would slightly change the equations above.

Since exp(-x) is a decreasing function, increasing index of coincidence corresponds to decreasing Rényi entropy. Sorting in increasing order by index of coincidence is the same as sorting in decreasing order by Rényi entropy.


What does RIPEMD stand for?

The RIPEMD-160 secure hash function may be best known these days for its role as part of the implementation of Bitcoin. I’ve wondered what “RIPEMD” stands for, and today I stumbled on an explanation [1]:

“RIPEMD” stands for “RIPE Message Digest,” where “RIPE” stands for “RACE Integrity Primitives Evaluation” and where “RACE” stands for “Research and Development in Advanced Communications Technologies in Europe”—a nice example of a recursive abbreviation.

This deserves a diagram:

I created the diagram above with DITAA.

Related posts

[1] Introduction to Cryptography with Open-Source Software by Alasdair McAndrew. CRC Press. 2011

Hashing phone numbers

A cryptographic hash is also known as a one-way function because given an input x, one can quickly compute the hash h(x), but it is extremely time-consuming to try to recover x if you only know h(x).

Even if the hashing algorithm is considered “broken,” it may take an enormous effort to break it. Google demonstrated that they could break a SHA-1 hash, but they used a GPU-century of compute power to do so. Attacks have become more efficient since then, but it still takes many orders of magnitude less work to compute a hash than to attempt to invert it. [1]

However, if you know that the hash value comes from a small set of possible inputs, brute force can discover which one. I wrote about this a couple years ago in the post Hashing names does not protect privacy. You could, for example, easily create a table of the hashes of all nine-digit social security numbers.

I often explain this to clients who have been told that hashed data is “encrypted.” This is subtle, because the data is encrypted, in a way, but not in the way they think.

A paper came out a few weeks ago that hashed 118 billion phone numbers.

The limited amount of possible mobile phone numbers combined with the rapid increase in affordable storage capacity makes it feasible to create key-value databases of phone numbers indexed by their hashes and then to perform constant-time lookups for each given hash value. We demonstrate this by using a high-performance cluster to create an in-memory database of all 118 billion possible mobile phone numbers from [reference] (i.e., mobile phone numbers allowed by Google’s libphonenumber and the WhatsApp registration API) paired with their SHA-1 hashes.

The authors were able to use this data base to query

10% of US mobile phone numbers for WhatsApp and 100% for Signal. For Telegram we find that its API exposes a wide range of sensitive information, even about numbers not registered with the service.

Related posts

[1] Hash functions are not invertible, even in theory, in the sense of a unique x leading to a hash value h(x). Suppose you’re computing a 256-bit hash on files that are one kilobyte (8192 bits). If you’re mapping a space of 28192 possible files into a space of 2256 possible hash values, the mapping cannot be one-to-one. However, if you know the inputs are not random bits but German prose, and you find a file of German prose that has a matching hash value, you’ve almost certainly recovered the file that led to the hash value.

Perfectly nonlinear functions

The other day I heard someone suggest that a good cocktail party definition of cryptography is puzzle making. Cryptographers create puzzles that are easy to solve given the key, but ideally impossible without the key.

Linearity is very useful in solving puzzles, and so a puzzle maker would like to create functions that are as far from linear as possible. That is why cryptographers are interested in perfectly nonlinear functions (PNs) and almost perfect nonlinear functions (APNs), which we will explore here.

The functions we are most interested in map a set of bits to a set of bits. To formalize things, we will look at maps between two finite Abelian groups A and B. If we look at lists of bits with addition defined as component-wise XOR [1], we have an Abelian group, but the ideas below apply to other kinds of Abelian groups as well.

We will measure the degree of nonlinearity of a function F from A to B by the size of the sets

\delta(a, b) = \{x \mid F(a + x) - F(x) = b\}

where a and x are in A and b is in B [2].

If F is a linear function, F(a + x) = F(a) + F(x), and so δ(a, b) simplifies to

\delta(a, b) = \{x \mid F(a) = b\}

which equals all of the domain A, if F(a) = b. So for linear functions, the set δ(a, b) is potentially as large as possible, i.e. all of A. Perfectly nonlinear functions will be functions where the sets δ(a, b) are as small as possible, and almost perfectly nonlinear functions will be functions were δ(a, b) is as small as possible in a given context.

The uniformity of a function F is defined by

\underset{a \ne 0}{\max_{a \in A, \, b\in B}} \,\, |\delta(a,b)|

The restriction a ≠ 0 is necessary since otherwise the uniformity of every function would be the same.

For linear functions, the uniformity is as large as possible, i.e. |A|. As we will show below, a nonlinear function can have maximum uniformity as well, but some nonlinear functions have smaller uniformity. So some nonlinear functions are more nonlinear than others.

We will compute the uniformity of a couple functions, mapping 4 bits to 1 bit, with the following Python code.

    A = range(16)
    B = range(2)
    nonZeroA = range(1, 16)

    def delta(f, a, b):
        return {x for x in A if f(x^a) ^ f(x) == b}

    def uniformity(f):
        return max(len(delta(f, a, b)) for a in nonZeroA for b in B)

We could look at binary [3] functions with domains and ranges of different sizes by changing the definitions of A, B, and nonZeroA.

Note that in our groups, addition is component-wise XOR, denoted by ^ in the code above, and that subtraction is the same as addition since XOR is its own inverse. Of course in general addition would be defined differently and subtraction would not be the same as addition. When working over a binary field, things are simpler, sometimes confusingly simpler.

We want to compute the uniformity of two functions. Given bits x0, x1, x2, and x3, we define

F(x0, x1, x2, x3) = x0 x1


G(x0, x1, x2, x3) = x0 x1 + x2 x3.

We can implement these in Python as follows.

    def F(x):
        x = [c == '1' for c in format(x, "04b")]
        return x[0] and x[1]

    def G(x):
        x = [c == '1' for c in format(x, "04b")]
        return (x[0] and x[1]) ^ (x[2] and x[3])

Next we compute the uniformity of the two functions.

    print( uniformity(F) )
    print( uniformity(G) )

This shows that the uniformity of F is 16 and the uniformity of G is 8. Remember that functions with smaller uniformity are considered more nonlinear. Both functions F and G are nonlinear, but G is more nonlinear in the sense defined here.

To see that F is nonlinear, let x = (1, 0, 0, 0) and y = (0, 1, 0, 0).

1 = F(x + y) ≠ F(x) + F(y) = 0.

It turns out that 8 is the smallest possible uniformity for a function from 4 bits to 1 bit, and the function G is said to be perfectly nonlinear.

I haven’t defined perfect nonlinearity or almost perfect nonlinearaity precisely, only saying that they have something to do with maximum nonlinearity (minimum uniformity) in some context. I may say more about that context in a future post.

Related posts

[1] XOR stands for exclusive or. For Boolean variables p and q, p XOR q is true if either p is true or q is true, but not both. The component-wise XOR of two binary numbers sets each bit of the result to the XOR of the corresponding input bits. For example, 1001 XOR 1010 = 0011.

[2] This definition comes from Perfect nonlinear functions and cryptography by Céline Blondeau and Kaisa Nyberg.

[3] The next post replaces binary numbers with base 5 and other bases.

Overview of NIST post-quantum encryption finalists

If and when large-scale quantum computing becomes practical, most public key encryption algorithms currently in use would be breakable. Cryptographers have known this since Peter Shor published his quantum factoring algorithm in 1994.

In 2017 researchers submitted 69 algorithms to the NIST Post-Quantum Cryptography Standardization Process. In 2019 NIST chose 26 of these algorithms to advance to the second round of competition. Yesterday NIST announced the finalists for the third round of its post-quantum cryptography competition.

The four finalists for public key encryption and key establishment management (KEM) are

  • Classic McEliece
  • NTRU

The three finalists for digital signatures are

  • Rainbow

There were five alternates for public key encryption/KEM

  • BIKE
  • FrodoKEM
  • HQC
  • NTRU Prime
  • SIKE

and three alternates for digital signatures

  • GeMSS
  • Picnic

Classic McEliece is a code-based encryption method. This terminology makes no sense if you think of codes and encryption as synonymous. Here “code” is being used in the sense of codes as in error-correcting codes.

Rainbow is an “unbalanced oil and vinegar” (UOV) algorithm. I wrote an introduction to UOV here.

The rest of the finalist algorithms (CRYSTALS-KYBER, NTRU, SABER, CRYSTALS-DILITHIUM, and FALCON) are all variations on the theme of learning with errors: RLWE (ring learning with errors), MLWE (module learning with errors), MLWR (module learning with rounding), etc. These are all based on the assumption that the analog of linear regression over a discrete ring or module is a hard problem, and would remain hard for quantum computers.

Among the alternates, SIKE is the only one involving elliptic curves. Shor’s algorithm makes it practical to solve the discrete logarithm problem for elliptic curves, and quantum computers could break traditional elliptic curve cryptography. But SIKE uses isogeny-based encryption. It doesn’t depend on the inner workings of individual elliptic curves but rather the isogenies between different elliptic curves.

The NIST report says that SIKE didn’t make the list of finalists because it was an order of magnitude slower than its competitors. But on the plus side, SIKE uses smaller keys and produces smaller cypher texts than the other methods. If researches find ways to speed up SIKE significantly, and if other researchers don’t find weaknesses in the method, it could be widely adopted in the future.


More posts on encryption.

How probable is a probable prime?

A probable prime is a number that passes a test that all primes pass and that most composite numbers fail. Specifically, a Fermat probable prime is a number that passes Fermat’s primality test. Fermat’s test is the most commonly used, so that’s nearly always what anyone means by probable prime unless they’re more specific.

A number n is a Fermat probable prime for base b if

bn-1 = 1 (mod n).

This test isn’t conclusive, but it can be implemented efficiently and it weeds out most composite numbers. To read more on probable primes, see this post.

If a number is a probable prime, how probable is it that it really is prime? This post will briefly summarize some results from a paper [1] that makes this precise. From that paper:

… let P(x) denote the probability that an integer n is composite given that

    1. n is chosen at random with 1 < nx, n odd,
    2. b is chosen at random with 1 < b < n − 1, and
    3. n is a probable prime to the base b.

The authors give upper bounds on P(x) for x equal to various powers of 2. In particular they report

P(2250) ≤ 5.876 × 10−6


P(2260) ≤ 3.412 × 10−6

and so the chances that a 256-bit probable prime is composite are in the neighborhood of 4 in a million. In practice, one would test with multiple b‘s. The tests for various b‘s aren’t entirely independent, but running the tests for multiple bases does mean that fewer composite numbers slip through. There are a few pesky numbers, the Carmichael numbers, that are Fermat probable primes for nearly all bases (see footnote [2] here for more details), but these are rare.

I looked through the paper for results for larger powers of 2 to get results that would be applicable to, for example, RSA keys. The largest result explicit in the paper is

P(2330) ≤ 8.713 × 10−8

though I would like to know P(21024) and P(21536) since RSA keys are the products of (probable) primes this large. Presumably the results in [1] could be used to compute P(x) for larger values of x, but I haven’t read the paper closely enough to how much personal effort or computational effort that would require. I imagine it would be difficult or else the authors would have reported results for probable primes of the size frequently used in applications.

Related posts

[1] Jared Ducker Lichtman and Carl Pomerance. Improved error bounds for the Fermat primality test on random inputs. Mathematics of Computation. Volume 87, Number 314, November 2018, pages 2871–2890.

Making an invertible function out of non-invertible parts

How can you make an invertible function out of non-invertable parts? Why would you want to?

Encryption functions must be invertible. If the intended recipient can’t decrypt the message then the encryption method is useless.

Of course you want an encryption function to be really hard to invert without the key. It’s hard to think all at once of a function that’s really hard to invert. It’s easier to think of small components that are kinda hard to invert. Ideally you can iterate steps that are kinda hard to invert and create a composite that’s really hard to invert.

So how do we come up with components that are kinda hard to invert? One way is to make small components that are non-linear, and that are in fact impossible to invert [1]. But how can you use functions that are impossible to invert to create functions that are possible to invert? It doesn’t seem like this could be done, but it can. Feistel networks, named after cryptographer Horst Feistel, provide a framework for doing just that.

Many block encryption schemes are based a Feistel network or a modified Feistel network: DES, Lucifer, GOST, Blowfish, LOKI, etc.

The basic idea of Feistel networks is so simple that it may go by too fast the first time you see it.

You take a block of an even number bits and split it into two sub-blocks, the left half L and the right half R. The nth round of a Feistel cipher creates new left and right blocks from the left and right blocks of the previous round by

\begin{align*} L_n =& R_{n-1} \\ R_n =& L_{n-1} \oplus f(R_{n-1}, K_n) \end{align*}

Here ⊕ is bitwise XOR (exclusive or) and f(Rn-1, Kn) is any function of the previous right sub-block and the key for the nth round. The function f need not be invertible. It could be a hash function. It could even be a constant, crushing all input down to a single value. It is one of the non-invertible parts that the system is made of.

Why is this invertible? Suppose you have Ln and Rn. How could you recover Ln-1 and Rn-1?

Recovering Rn-1 is trivial: it’s just Ln. How do you recover Ln-1? You know Rn-1 and the key Kn and so you can compute

R_n \oplus f(R_{n-1}, K_n) = L_{n-1} \oplus f(R_{n-1}, K_n)\oplus f(R_{n-1}, K_n) = L_{n-1}

The main idea is that XOR is its own inverse. No matter what f(Rn-1, Kn) is, if you XOR it with anything twice, you get that thing back.

At each round, only one sub-block from the previous round is encrypted. But since the roles of left and right alternate each time, the block that was left alone at one round will be encrypted the next round.

When you apply several rounds of a Feistel network, the output of the last round is the encrypted block. To decrypt the block, the receiver reverses each of the rounds in the reverse order.

A sketch of DES

The DES (Data Encryption Standard) algorithm may be the best-known application of Feistel networks. It operates on 64-bit blocks of data and carries out 16 rounds. It takes a 56-bit key [2] and derives from it different 48-bit keys for each of the 16 rounds. In the context of DES, the function f described above takes 32 bits of data and a 48-bit key and returns 32 bits. This function has four steps.

  1. Expand the 32 bits of input to 48 bits by duplicating some of the bits.
  2. XOR with the key for that round.
  3. Divide the 48 bits into eight groups of 6 bits and apply an S box to each group.
  4. Permute the result.

The S boxes are nonlinear functions that map 6 bits to 4 bits. The criteria for designing the S boxes was classified when DES became a standard, and there was speculation that the NSA has tweaked the boxes to make them less secure. In fact, the NSA tweaked the boxes to make them more secure. The S boxes were modified to make them more resistant to differential cryptanalysis, a technique that was not publicly know at the time.

More cryptography posts

[1] These functions are impossible to invert in the sense that two inputs may correspond to the same output; there’s no unique inverse. But they’re also computationally difficult to invert relative to their size: for a given output, it’s time consuming to find any or all corresponding inputs.

[2] When DES was designed in the 1970’s researchers objected that 56-bit keys were too small. That’s certainly the case now, and so DES is no longer secure. DES lives on as a component of Triple DES, which uses three 56-bit keys to produce 112-bit security. (Triple DES does not give 168 bits of security because it is vulnerable to a kind of meet-in-the-middle attack.)