ASCII armor: send encrypted data in plain text

GnuPG ASCII armor is a way to encode binary data as text and decode the text back into binary. It could be used with any binary data, but most often it is used with cryptographic data: public keys, encrypted messages, etc.

Secure communication over an insecure channel

When people ask whether some medium “supports encryption,” or more specifically whether the medium “supports end-to-end encryption,” they’re asking how convenient it is to use the medium with encrypted data. Any medium can convey encrypted text, but the process may be more or less manual.

You can use Gmail for end-to-end encryption, for example, though I’m sure Google would rather you not. Just encrypt the data on your computer, convert the output to ASCII, paste it into an email, and send. The receiver then copies the ASCII text, converts it to binary, and then decodes it.

ASCII armor is a way of handling the conversion to and from ASCII, with some niceties such as a checksum and human-friendly formatting, as friendly as formatting can be under the circumstances.

Aside: Coding versus Encryption

There are two kinds of encoding in this context: secret and non-secret. Coding theory is often misunderstood because it is primarily concerned with non-secret encoding, such as error-correcting codes and transmission protocols. ASCII armor belongs to coding theory, not encryption, though it is usually deployed in service of encryption.

ASCII armor is not providing protection from prying eyes. It is providing a relatively convenient way to convey naturally binary data, including a checksum to detect errors in transmission.

How ASCII armor works

At its heart, ASCII armor does base-64 encoding. But there’s more to it than that. There are optional fields, formatting rules, and a checksum.

For an example, we’ll take the first 120 binary digits of π after the “decimal” point.

Here’s some Python code to write the bits to a file. We start with the bits in hexadecimal since that’s more compact. In hex notation, π = 3.243f6a8885a3… For more details, see this post on hexadecimal floating point.

import binascii

data = '243F6A8885A308D313198A2E03707344A4093822299F31D0082EFA98EC4E6C89'
with open('pibits', 'wb') as f:
    f.write(binascii.unhexlify(data))

We can inspect pibits in a hex editor to verify that the bits are correct, say by running xxd.

Enarmor

We can convert our binary file to ASCII using gpg, which stands for GNU Privacy Guard (GnuPG), the GNU implementation of Pretty Good Privacy. Running

gpg --enarmor pibits

produces a file pibits.ascii with the following contents.

-----BEGIN PGP ARMORED FILE-----
Comment: Use "gpg --dearmor" for unpacking

JD9qiIWjCNMTGYouA3BzRKQJOCIpnzHQCC76mOxO
=0G7A
-----END PGP ARMORED FILE-----

Dearmor

If we run

gpg --dearmor pibits.asc

we get a file pibits.asc.gpg identical to the original file pibits, which we could verify using diff.

Base 64 encoding

How does the string JD9… above correspond to bits?

A hexadecimal character represents 4 bits, and a base 64 character represents 6 bits. So every 3 hexadecimal character corresponds to 12 bits or 2 base 64 characters.

The first three hexadecimal characters in our data, 243, corresponds to the first two characters in our ASCII armor data, JD, as follows. First of all,

243hex = 001001000011two

and if we split the bits into groups of six we have 001001two = 9ten and 000011two = 3ten. ASCII armor represents base-64 numbers the same way MIME encoding does: the numbers 0 through 63 are represented by

A, B, C, … Z, a, b, c, …, z, 0, 1, 2, …, 9, +, /

So 9 is encoded as J, and 3 is encoded as D.

The 120 bits of π represented by the 30 hexadecimal characters 243…C89 or by the 20 base 64 characters JD…xO in base 64. Note that the last character is the letter O, not the decimal 0. (Base 58 avoids this problem by not using typographically similar symbols.)

In this example we had 120 bits, which is a multiple of 6. What if the number of bits is not a multiple of 6? See this post for an answer.

Checksum

After the 20 characters encoding our data follows five more characters, =0G7A. The equal sign is a separator, and 0G7A is a base 64 encoding of a 24-bit CRC checksum. The details of the checksum, and C code for implementing it, are given in RFC 4880 Section 6.1.

Related posts