Security by obscurity

Security-by-obscurity is a bad idea in general. It’s better, for example, to have a login page than to give your site an obscure URL. It’s better to encrypt a file than to hide it in some odd directory. It’s better to use a well-vetted encryption algorithm than to roll your own.

There are people whose knee-jerk reaction to any form of obscurity is to shout “That’s security-by-obscurity!” but obscurity can be subtle.

All else being equal, adding a layer of obscurity doesn’t hurt. For example, you can literally make a public encryption key public, as I’ve done here. But for extra security, why distribute your encryption key more widely than necessary? And if your message is adequately encrypted, you could in principle publish it for the world to see. But why not just give it to the intended recipient?

The public key on my site is there for strangers to contact me, but if I were really concerned about secure communication between colleagues, I’d just circulate the key among those colleagues. That may not be much more secure, but surely it’s no less secure. And I’d share messages privately, even though they are encrypted.

It’s good to look closely at any argument that beings “all else being equal” to see if all else is indeed equal. A more nuanced objection to security-by-obscurity is that it can create a false sense of security.

One could argue, for example, that making your public key available to the world forces you to be more careful about your encryption. Maybe you’ve been using an RSA key for years, and you really should use a longer key, but you don’t because you can argue that not many people have your public key anyway. But if your key’s too sort, obscuring your public key doesn’t help.

And while it’s better to deliver encrypted messages privately, it helps to not count on this, to assume that the encrypted message might be made public. That’s the basic premise behind encryption.

The principle behind no-security-by-obscurity is that you want to concentrate your security where it can be quantified. You can, for example, quantify how much more effort it would take to break a 64-bit key (like Blowfish) than a 56-bit key (like DES). Or even better, a 128-bit key (like AES). But you can’t quantify the level of protection that comes from obscurity.

Is it more secure to give someone a 56-bit DES key on a flash drive in a dark alley than to send them a 64-bit Blowfish key over SMS You can’t calculate an answer to that question.

In some sense all security is by obscurity. Cryptography literally means hidden writing. But all else being equal—there’s that phrase again—you want to minimize the surface area of what you have to obscure, e.g. limiting your secret to your key and not your algorithm, and it’s better to have quantified risks than unquantified risks. But all else is often not equal, and there are difficult trade-offs.

Related posts

Brute force cryptanalysis

A naive view of simple substitution ciphers is that they are secure because there are 26! ways to permute the English alphabet, and so an attacker would have to try 26! ≈ 4 × 1026 permutations. However, such brute force is not required. In practice, simple substitution ciphers are breakable by hand in a few minutes, and you can find software that automates the process.

However, for modern encryption, apparently brute force is required. If you encrypt a message using AES with a 128-bit key, for example, you can’t do much better than try 2128 keys. You might be able to do a little better, but as far as is openly known, you can’t do orders of magnitude better.

Even for obsolete encryption methods such as DES it still takes a lot more effort to break encryption than to apply encryption. The basic problem with DES is that it used 56-bit keys, and trying 256 keys is feasible [1]. You won’t be able to do it on your laptop, but it can be done using many processors in parallel [2]. Still, you’d need more than a passing curiosity about a DES encrypted message before you’d go to the time and expense of breaking it.

If breaking a simple substitution cipher really did require brute force, it would offer 88-bit security. That is, 26! roughly equals 288. So any cipher offering b-bit security for b > 88 is more secure in practice than breaking simple substitution ciphers would be in naive theory. This would include AES, as well as many of its competitors that weren’t chosen for the standard, such as Twofish.

For all the block ciphers mentioned here, the number of bits of security they offer is equal to the size of the key in bits. This isn’t always the case. For example, the security level of an RSA key is much less than the size of the key, and the relation between key size and security level is nonlinear.

A 1024-bit RSA modulus is believed to offer on the order of 87 bits security, which incidentally is comparable to 26! as mentioned above. NIST FIPS 184-5 recommends 2048 bits as the minimum RSA modulus size. This gives about 117 bits of security.

The security of RSA depends on the difficulty of factoring the product of large primes [3], and so you can compute the security level of a key based on the efficiency of the best known factoring algorithm, which is currently the General Number Field Sieve. More on this here.

Related posts

[1] There are ways to do better than brute force against DES, if you have an enormous number of messages all encrypted with the same key.

[2] In 1998, the EFF built a machine called Deep Crack with 1,856 custom processors that could crack DES encoded messages in nine days, four and a half days on average.

[3] Nobody has proved that breaking RSA requires factoring. There is a variation on RSA that is provably as hard as factoring but as far as I know it has never been widely used.

Straddling checkerboard encryption

Introduction

Computers fundamentally changed cryptography, opening up new possibilities for making and breaking codes. At first it may not have been clear which side benefited most, but now it’s clear that computers gave more power to code makers than code breakers.

We now have cryptographic primitives that cannot be attacked more efficiently than by brute force, as far as we know. The weak link is how these primitives are implemented and combined, not the primitives themselves.

Before computers there was more of a cat and mouse game between encryption and cryptanalysis. Encryption schemes that were convenient to carry out by hand could usually be broken by hand eventually. But if you only needed secrecy briefly, a simple scheme might provide that secrecy for long enough. This post will look at one such scheme, the straddling checkerboard.

Checkerboards

Perhaps the most obvious way to conveniently turn letters into numbers is to arrange the letters into a 5 × 5 grid. This has to leave out one letter, and in practice this meant combining I and J. Or if you needed digits, you could use a 6 × 6 grid and put J back in. You’d scramble the alphabet in the grid according to some key, then encrypt each letter by its coordinates.

       12345
      +-----
     1|EBISP
     2|XWLVN
     3|AOYZQ
     4|MDCKH
     5|RTUFG

This is no better than a simple substitution cipher because someone intercepting a message encrypted this way would easily guess that pairs of digits represent letters. However, if you then permuted the digits with a transposition cipher, you’d have something more formidable. This is essentially what the ADFGV cipher did, which stumped cryptanalysts for a while.

The straddling checkerboard is a variation on the method above. Letters would be arranged in a 3 × 10 grid rather than 5 × 5. Some letters would be encrypted as a single digit and some as a pair of digits.

       1234567890
      +----------
      |  EBISPXWL
     1|VNAOYZQMDC
     2|KHRTUFGJ./

In the example above, E would be encrypted as 3, N would be encrypted as 12, and so on. This is an instance of a prefix code. In order to be able to decode the digits unambiguously, no letter could be encoded as 1 or 2; these digits always signaled the beginning of a pair.

Prefix codes are often used in non-secret codes, such as country codes for telephone numbers. More examples of prefix codes in this post.

Because 1 and 2 could not be used to encode single letters, there were 28 slots to fill. These could be filled with other symbols, and in practice period and slash were added [1].

Efficiency

The straddling checkerboard gives a more efficient encoding than does the checkerboard since typically fewer digits will be required. If efficiency were the only concern, we’d put the eight most frequent letters on the top row, something like the following [2].

       1234567890
      +----------
      |  ETAOINSR
     1|BCDFGHJKLM
     2|PQUVWXYZ./

This would be more efficient but less secure since the arrangement of the letters would be more predictable.

Security

The straddling checkerboard presents a bit of a challenge to the cryptanalyst since it’s not know a priori whether a digit is part of a pair (if the vertical coordinates are not always 1 and 2).

The straddling checkerboard didn’t offer much security even in its day. It would have been better if there had been some further processing done on the digits, such as how the ADFGV cipher permuted its coordinates.

The message, interpreted as a number N, could have been further encrypted as aN + b where a and b were randomly chosen numbers that were part of the key. As far as I know, nothing like this was ever done. This would have provided more security but would also require more effort and increase the chance of introducing errors.

Related posts

[1] David Kahn. The Codebreakers. Chapter 18.

[2] You may have expected the last letter on the first row to be H, going by the printer’s order ETAOIN SHRDLU. Peter Norvig discovered a slightly different order of letter frequencies based on the Google corpus.

Binary to text to binary

Gnu Privacy Guard includes a way to encode binary files as plain ASCII text files, and turn these text files back into binary. This is intended as a way to transmit encrypted data, but it can be used to convert any kind of file from binary to text and back to binary.

To illustrate this, I’ll use Albrecht Dürer’s Melencolia I as an example. (More on this image and its mathematical significance here.)

Albrecht Dürer’s engraving Melencolia I

This image is saved in the file Melancholia.jpg.

Binary to text

If we run

   gpg --enarmor Melencolia.jpg

at a command line, it produces a file Melancholia.jpg.asc, the asc suffix indicating an ASCII file.

We can look inside this file if we’d like.

-----BEGIN PGP ARMORED FILE-----
Comment: Use "gpg --dearmor" for unpacking

/9j/4AAQSkZJRgABAQEBLAEsAAD/4QBoRXhpZgAATU0AKgAAAAgABAEaAAUAAAAB
AAAAPgEbAAUAAAABAAAARgEoAAMAAAABAAIAAAExAAIAAAARAAAATgAAAAAAAAEs
AAAAAQAAASwAAAABUGFpbnQuTkVUIHYzLjUuOAAA/9sAQwACAQECAQECAgICAgIC
...
wJ/wkzRf8IN4LCCVVwNCtM8sDnPl5zkk565NbcH7Lnw01K0gtrn4feCriB4QfLl0
S2dV+bdgApwMknAwMk+tFFcktjqif//Z
=rpd1
-----END PGP ARMORED FILE-----

The cryptic text /9j/4A…//Z is a base 64 encoded representation of the binary file. Think of the file as one big binary number. Write that number in base 64, i.e. partition the bits into groups of six. Then represent the base 64 “digits” as alphanumeric characters, plus the symbols + and /. More on the details here.

The line =rpd1 is a 24-bit CRC checksum. The equal sign is a separator, and rpd1 is a base 64 encoding the checksum.

The JPG file is 91,272 bytes and the ASCII file is 123,712 bytes. The ASCII file is about 1/3 larger because every six bits in the binary file corresponds to an eight-bit ASCII character. The ASCII file is a little bit more than 1/3 larger because of the human-friendly text above and below the base 64 encoding, the newline characters, and the checksum.

Text to binary

If we run

    gpg --dearmor Melencolia.jpg.asc

at a command line, it produces a file Melancholia.jpg.asc.gpg. This file is bit-for-bit exactly the same as the original file, which we could confirm by running

    diff Melencolia.jpg Melencolia.jpg.asc.gpg

Related posts

Previous digital signature standard expires next month

The Digital Signature Standard (DSS) FIPS 184-4, first published in 2013, expires a few days from now, on February 3, 2024. It is superseded by NIST FIPS 184-5. This new version was published on February 3, 2023, giving everyone a year to adopt the new standard before it became required.

The differences between the two standards are summarized in Appendix E of the new standard. Here I’ll point out three differences, then expand a little on the third difference.

First of all, the Digital Signature Algorithm (DSA) from earlier versions of FIPS 184 is withdrawn.

Second, RSA keys have gotten longer. The previous minimum key length was 1024 bits. Now it is 2048.

Third, elliptic curve cryptography has matured quite a bit since 2013.

The 2013 version of the standard gave users a lot more freedom in choosing elliptic curves and base points on those curves. Now it appears that much of this freedom hasn’t turned out so well.

The binary curves and Koblitz curves that were approved in 2013 are now deprecated. These are the curves whose names begin with B– or K– as described in this post on elliptic curve naming conventions. But the P– curves P-224, P-256P-384, and P-521 are still recommended.

While the binary and Koblitz curves have been removed, Edwards curves were added. Specifically Ed25519 and Ed448 have been added for the Edwards Digital Signature algorithm (EdDSA).

Related posts

Weak encryption and surveillance

Two of the first things you learn in cryptography are that simple substitution ciphers are very easy to break, and that security by obscurity is a bad idea. This post will revisit both of these ideas.

Security depends on your threat model. If the threat you want to protect against is a human reading your encrypted messages, then simple substitution ciphers are indeed weak. Anyone capable of working a cryptogram in a puzzle book is capable of breaking your encryption.

But if your threat model is impersonal surveillance, things are different. It might not take much to thwart a corporation scanning your email for ideas of what to advertise to you. Even something as simple as rot13 might be enough.

The point of this article is not to recommend rot13, or any other simple substitution cipher, but to elaborate on the idea that evading commercial surveillance is different from evading intelligence agencies. It’s easier to evade bots than spooks.

If you’re the target of a federal investigation, the most sophisticated encryption at your disposal may be inadequate. But if you’re the target of an advertising company, things are much easier.

Rot13

Rot13 works by moving letters ahead 13 positions in the alphabet. The nth letter goes to the (n + 13)th letter mod 26. So A’s become N’s, and N’s become A’s, etc. Rot13 offers no security at all, but it is used to prevent text from being immediately recognizable. A common application is to publish the punchline of jokes.

Rot13 converts the word “pyrex” to “clerk.” If you use the word “pyrex” in an email, but rot13 encrypt your message, you’re unlikely to see advertisements for glassware unless you also talk about a clerk.

It is conceivable, but doubtful, that impersonal surveillance software would try to detect rot13 encoding even though it would be easy to do. But detecting and decrypting simple substitution ciphers in general, while certainly possible, would not be worth the effort. Security-by-obscurity could protect you from surveillance because it’s not profitable for mass surveillance to pursue anything obscure. For example, the Playfair cipher, broken over a century ago, would presumably throw off bots.

Modern but deprecated encryption

Simple substitution ciphers are ancient. Modern encryption methods, even deprecated methods like DES, are far more secure. For example, DES (Data Encryption Standard) is considered obsolete because it can be broken by a multiprocessor machine in under 24 hours. However, commercial surveillance is unwilling to spend processor-days to decrypt one person’s communication.

This is not to recommend DES. If it’s just as easy to use much better algorithms like AES (Advanced Encryption Standard) then why not do that? My purpose in bringing up DES is to say that squabbles over the merits of various modern encryption methods are beside the point if your goal is to evade impersonal surveillance.

Everyone agrees DES should be put out to pasture, but it doesn’t matter if you’re just trying to avoid surveillance. Things that not everyone agrees on matter even less.

What matters more than the algorithms is who holds the keys. A system in which you alone hold a DES key may give you more privacy than a system that holds an AES key for you. Companies may want to protect your private data from competitors but have no qualms about using it themselves.

Why surveillance matters

Why go to any effort to evade corporate surveillance?

Companies share data, and companies get hacked. One innocuous piece of data may be the link that connects two databases together. Things that you don’t mind revealing may unlock things you don’t want to reveal.

Related: A statistical problem with nothing to hide

Example of memorizing a 256-bit private key

There are techniques that can enable anyone to memorize much more than may seem possible. This post will show how I generated and memorized a 256-bit encryption key this morning using the approach explained here.

TANSTAAFL

There ain’t no such thing as a free lunch. This saying is abbreviated TANSTAAFL in Heinlein’s novel The Moon is a Harsh Mistress. It takes effort to safe effort.

Memorization techniques make it easier to remember new things if you invest effort into the techniques. The more you invest into the techniques, the easier memorizing new things is.

Whether this investment is practical depends on the person. I find it more interesting than practical; I rarely need to memorize anything. But I know of people who have used these techniques to great advantage.

Generating a key

First, I’ll generate a 256-bit number using Python.

>>> import secrets
>>> s = secrets.randombits(256)

This produced the following:

14769232028620221959695310712392700269168526908419649910136349315042507303581

If you were to run the same code you would not get the same number, which is the point of the secrets module. It seeds a random number generator with entropy extracted from the state of the computer it is running on.

Parsing digits

The number above has 77 digits, so I split it into a two-digit number, 14, and 25 three-digit numbers: 769, 232, …, 581. Then using Major mnemonic system encoding, I associate each number with a letter of the NATO phonetic alphabet.

I encode 14 as “tar”, and the NATO word for the letter A is “alpha,” so I imagine an alpha male covered in tar.

I encode 769 as “ketchup,” and the NATO word for B is “bravo,” so I imagine a brave bottle of ketchup, arms akimbo, and wearing a superhero cape.

I encode 581 as “elevator” [1], and the phonetic word for Z is Zulu, and so I imagine Shaka Zulu riding in an elevator.

Using this process I memorized the random number above in a few minutes.

Just for fun I asked DALL-E 2 to produce an image of “Shaka Zulu, spear in hand, riding in an elevator” the image below is one of the ones it created.

Shaka Zulu, spear in hand, riding in an elevator

Related posts

[1] Strictly speaking, “elevator” decodes as 5814. But I use a common convention of only considering the first three consonants in a word. This is a good trade-off because it’s not likely you could encode a 4-digit number in a single word.

What is the point of a public key fingerprint?

Public key cryptography uses two keys: a private key and a public key. The nature of these keys depends on the encryption scheme, such as whether one is using RSA, ECC, or some other method, but you can think of a key as a long number. A key may be a short list of numbers, but let’s just say a key is a number.

When you generate a public key, you get a pair of numbers, one that you keep secret and one that you publicize. Anyone can use your public key to encrypt a message that only you can decrypt, assuming only you have your private key.

If I give you my public key, say I post it on my website, how can you be sure that it’s really my key? Maybe my site has been hacked and I don’t even know it. Or maybe someone tampers with the site content between the time it leaves my server and arrives at your browser (though TLS is supposed to address that).

We could get on a phone call and you could read the key back to me. The problem with this approach is that keys are long. A 4096-bit RSA key, for example, encoded in hexadecimal, is 1024 characters long.

You could read off the last so many characters of the key, maybe telling me that what you received ends in 9F0D 38FA. That would probably be sufficient to tell one key from another—it’s very unlike you’d have two keys in a keychain that share the same last 32 bits—but that doesn’t mean the previous part of the key hasn’t been tampered with.

What you’d really like is a cryptographic hash of the public key, short enough to conveniently compare, but long enough that it would be infeasible for someone to alter the public key in such a way as to produce the same hash. And that’s exactly what a fingerprint is.

In GnuPG, the GNU implementation of PGP, a fingerprint is an 160-bit hash, which means 40 hexadecimal characters. Manually verifying 40 characters is tedious but feasible; manually verifying 1024 characters is not.

ASCII armor: send encrypted data in plain text

GnuPG ASCII armor is a way to encode binary data as text and decode the text back into binary. It could be used with any binary data, but most often it is used with cryptographic data: public keys, encrypted messages, etc.

Secure communication over an insecure channel

When people ask whether some medium “supports encryption,” or more specifically whether the medium “supports end-to-end encryption,” they’re asking how convenient it is to use the medium with encrypted data. Any medium can convey encrypted text, but the process may be more or less manual.

You can use Gmail for end-to-end encryption, for example, though I’m sure Google would rather you not. Just encrypt the data on your computer, convert the output to ASCII, paste it into an email, and send. The receiver then copies the ASCII text, converts it to binary, and then decodes it.

ASCII armor is a way of handling the conversion to and from ASCII, with some niceties such as a checksum and human-friendly formatting, as friendly as formatting can be under the circumstances.

Aside: Coding versus Encryption

There are two kinds of encoding in this context: secret and non-secret. Coding theory is often misunderstood because it is primarily concerned with non-secret encoding, such as error-correcting codes and transmission protocols. ASCII armor belongs to coding theory, not encryption, though it is usually deployed in service of encryption.

ASCII armor is not providing protection from prying eyes. It is providing a relatively convenient way to convey naturally binary data, including a checksum to detect errors in transmission.

How ASCII armor works

At its heart, ASCII armor does base-64 encoding. But there’s more to it than that. There are optional fields, formatting rules, and a checksum.

For an example, we’ll take the first 120 binary digits of π after the “decimal” point.

Here’s some Python code to write the bits to a file. We start with the bits in hexadecimal since that’s more compact. In hex notation, π = 3.243f6a8885a3… For more details, see this post on hexadecimal floating point.

import binascii

data = '243F6A8885A308D313198A2E03707344A4093822299F31D0082EFA98EC4E6C89'
with open('pibits', 'wb') as f:
    f.write(binascii.unhexlify(data))

We can inspect pibits in a hex editor to verify that the bits are correct, say by running xxd.

Enarmor

We can convert our binary file to ASCII using gpg, which stands for GNU Privacy Guard (GnuPG), the GNU implementation of Pretty Good Privacy. Running

gpg --enarmor pibits

produces a file pibits.ascii with the following contents.

-----BEGIN PGP ARMORED FILE-----
Comment: Use "gpg --dearmor" for unpacking

JD9qiIWjCNMTGYouA3BzRKQJOCIpnzHQCC76mOxO
=0G7A
-----END PGP ARMORED FILE-----

Dearmor

If we run

gpg --dearmor pibits.asc

we get a file pibits.asc.gpg identical to the original file pibits, which we could verify using diff.

Base 64 encoding

How does the string JD9… above correspond to bits?

A hexadecimal character represents 4 bits, and a base 64 character represents 6 bits. So every 3 hexadecimal character corresponds to 12 bits or 2 base 64 characters.

The first three hexadecimal characters in our data, 243, corresponds to the first two characters in our ASCII armor data, JD, as follows. First of all,

243hex = 001001000011two

and if we split the bits into groups of six we have 001001two = 9ten and 000011two = 3ten. ASCII armor represents base-64 numbers the same way MIME encoding does: the numbers 0 through 63 are represented by

A, B, C, … Z, a, b, c, …, z, 0, 1, 2, …, 9, +, /

So 9 is encoded as J, and 3 is encoded as D.

The 120 bits of π represented by the 30 hexadecimal characters 243…C89 or by the 20 base 64 characters JD…xO in base 64. Note that the last character is the letter O, not the decimal 0. (Base 58 avoids this problem by not using typographically similar symbols.)

In this example we had 120 bits, which is a multiple of 6. What if the number of bits is not a multiple of 6? See this post for an answer.

Checksum

After the 20 characters encoding our data follows five more characters, =0G7A. The equal sign is a separator, and 0G7A is a base 64 encoding of a 24-bit CRC checksum. The details of the checksum, and C code for implementing it, are given in RFC 4880 Section 6.1.

Related posts