Base85, Ascii85, and Z85

I wrote a while back about Base32 and Base64 encoding, and yesterday I wrote about Bitcoin’s Base58 encoding. For completeness I wanted to mention Base85 encoding, also known as Ascii85. Adobe uses it in PostScript and PDF files, and git uses it for encoding patches.

Like Base64, the goal of Base85 encoding is to encode binary data printable ASCII characters. But it uses a larger set of characters, and so it can be a little more efficient. Specifically, it can encode 4 bytes (32 bits) in 5 characters.

Why 85?

There are 95 printable ASCII characters, and

log₉₅(2³²) = 4.87

and so it would take 5 characters encode 4 bytes if you use all possible printable ASCII characters. Given that you have to use 5 characters, what’s the smallest base that will still work? It’s 85 because

log₈₅(2³²) = 4.993

and

log₈₄(2³²) = 5.006.

(If you’re not comfortable with logarithms, see an alternate explanation in the footnote [1].)

Now Base85 is different from the other bases I’ve written about because it only works on 4 bytes at a time. That is, if you have a number larger than 4 bytes, you break it into words of 4 bytes and convert each word to Base 85.

Character set

The 95 printable ASCII characters are 32 through 126. Base 85 uses characters 33 (“!”) through 117 (‘u’). ASCII character 32 is a space, so it makes sense you’d want to avoid that one. Since Base85 uses a consecutive range of characters, you can first convert a number to a pure mathematical radix 85 form, then add 33 to each number to find its Base85 character.

Example

Suppose we start with the word 0x89255d9, equal to 143807961 in decimal.

143807961 = 2×85⁴ + 64×85³ + 14×85² + 18×85 + 31

and so the radix 85 representation is (2, 64, 14, 18, 31). Adding 33 to each we find that the ASCII values of the characters in the Base85 representation are (35, 97, 47, 51, 64), or (‘#’, ‘a’, ‘/’, ‘3’, ‘@’) and so #a/3@ is the Base85 encoding of 0x89255d9.

Z85

The Z85 encoding method is also based on a radix 85 representation, but it chose to use a different subset of the 95 printable characters. Compared to Base85, Z85 adds seven characters

v w x y z { }

and removes seven characters

` \ " ' _ , ;

to make the encoding work more easily with programming languages. For example, you can quote Z85 strings with single or double quotes because neither kind of quote is a valid Z85 character. And you don’t have to worry about escape sequences since the backslash character is not part of a Z85 representation.

Gotchas

There are a couple things that could trip someone up with Base85. First of all, Base 85 only works on 32-bit words, as noted above. For larger numbers it’s not a base conversion in the usual mathematical sense.

Second, the letter z can be used to denote a word consisting of all zeros. Since such words come up disproportionately often, this is a handy shortcut, though it means you can’t just divide characters into groups of 5 when converting back to binary.

[1] 95⁴ = 81450625 < 2³² = 4294967296, so four characters from an alphabet of 95 elements is not enough to represent 2³² possibilities. So we need at least five characters.

85⁵ = 4437053125 > 2³², so five characters is enough, and in fact it’s enough for them to come from an alphabet of size 85. But 84⁵ = 4182119424 < 2³², so an alphabet of 84 characters isn’t enough to represent 32 bits with five characters.

6 thoughts on “Base85 encoding”

Dithermaster

5 March 2019 at 08:27

It reminds me of a project I did where space was important. I encoded fixed-length part number strings in RAD50 (which ironically cannot encode 50 characters, just 40). It got us uppercase alphanumerics, space, and three additional characters (I made one of them “-” since that was used in these part numbers). https://en.wikipedia.org/wiki/DEC_Radix-50
Andrew

5 March 2019 at 11:17

“And you don’t have to worry about escape sequences since the backspace character is not part of a Z85 representation.”

Should be “backslash”, not “backspace”.
Marco

6 March 2019 at 04:27

Nice piece!
Please, note that the link to Base58 encoding post is wrong.
John

6 March 2019 at 09:25

Thanks. Fixed.
roblogic

9 September 2019 at 11:05

Thanks for this, i am gonna switch to Z85. See if you can decode this:
iQ)q:_FruPuU^I”J,g#Y#bh;;ucpn’*+=X’1)_m%^AIALQ1″98E%+92qfUp’h(h(E,+:)XjReuQfa^^p
Ns.t89″p_pP”]D=&24Y4?pG,f/Ge0an);”9\u5&/YO80F31O+FnumJ-H”h0EcNK#RCDAHiP=3[KP[jiP
E#*K<bXcj9c;$2?3^W&u)$FVRm6$D7A5"XS[J%u/:i!]Tn1]_".#%82[66o*P71]HP+9;I-0dJ/o^Dh\
!
Martin Jambon

6 January 2022 at 16:49

A while ago, I proposed a base-120 encoding using two-letter digits formed from one consonant (b-z) and one vowel (a, e, i, o, u, y). This gives us numbers like cewo (945), cipote (123571) or ritufyna (139596420). There are more details and some code at https://github.com/mjambon/base120.

Comments are closed.

Why 85?

Character set

Example

Z85

Gotchas

Related posts

6 thoughts on “Base85 encoding”