Binary to text to binary

Gnu Privacy Guard includes a way to encode binary files as plain ASCII text files, and turn these text files back into binary. This is intended as a way to transmit encrypted data, but it can be used to convert any kind of file from binary to text and back to binary.

To illustrate this, I’ll use Albrecht Dürer’s Melencolia I as an example. (More on this image and its mathematical significance here.)

Albrecht Dürer’s engraving Melencolia I

This image is saved in the file Melancholia.jpg.

Binary to text

If we run

   gpg --enarmor Melencolia.jpg

at a command line, it produces a file Melancholia.jpg.asc, the asc suffix indicating an ASCII file.

We can look inside this file if we’d like.

-----BEGIN PGP ARMORED FILE-----
Comment: Use "gpg --dearmor" for unpacking

/9j/4AAQSkZJRgABAQEBLAEsAAD/4QBoRXhpZgAATU0AKgAAAAgABAEaAAUAAAAB
AAAAPgEbAAUAAAABAAAARgEoAAMAAAABAAIAAAExAAIAAAARAAAATgAAAAAAAAEs
AAAAAQAAASwAAAABUGFpbnQuTkVUIHYzLjUuOAAA/9sAQwACAQECAQECAgICAgIC
...
wJ/wkzRf8IN4LCCVVwNCtM8sDnPl5zkk565NbcH7Lnw01K0gtrn4feCriB4QfLl0
S2dV+bdgApwMknAwMk+tFFcktjqif//Z
=rpd1
-----END PGP ARMORED FILE-----

The cryptic text /9j/4A…//Z is a base 64 encoded representation of the binary file. Think of the file as one big binary number. Write that number in base 64, i.e. partition the bits into groups of six. Then represent the base 64 “digits” as alphanumeric characters, plus the symbols + and /. More on the details here.

The line =rpd1 is a 24-bit CRC checksum. The equal sign is a separator, and rpd1 is a base 64 encoding the checksum.

The JPG file is 91,272 bytes and the ASCII file is 123,712 bytes. The ASCII file is about 1/3 larger because every six bits in the binary file corresponds to an eight-bit ASCII character. The ASCII file is a little bit more than 1/3 larger because of the human-friendly text above and below the base 64 encoding, the newline characters, and the checksum.

Text to binary

If we run

    gpg --dearmor Melencolia.jpg.asc

at a command line, it produces a file Melancholia.jpg.asc.gpg. This file is bit-for-bit exactly the same as the original file, which we could confirm by running

    diff Melencolia.jpg Melencolia.jpg.asc.gpg

Related posts