Base58 versus Base85 encoding

Base58 encoding and Base85 encoding are used to represent binary data in a human-friendly way. Base58 uses a smaller character set and so is more conservative. Base85 uses a larger character set and so is more efficient.

There is a gotcha in that “base” means something different in Base58 compared to Base85. More on that below.

Base58

Base58 encoding is primarily used as part of the Bitcoin system. It is part of the Base58Check protocol used for encoding addresses and keys.

Base58 encoding is essentially the same as mathematical base 58 encoding, with a specific character set. The symbols for the “digits” 0 through 57 are chosen to avoid typographically similar letters. We’ll give that character set in the examples below.

There is only one version of Base58 in common use as far as I know, unlike Base85.

Base85

Base85 is a more compact alternative to Base64 encoding. The former encodes 4 bytes in 5 characters while the latter requires 6 characters. Base85 is used inside the PDF format. It is also used in the patch encoding for git.

Base85 encoding is analogous to binary-coded decimal (BCD). In some early computer systems, integers would not be expressed in binary per se. Instead, each digit would be represented as by four bits. So to represent a number like 427, you’d express 4, 2, and 7 in binary: 0100 0010 0111. If you were to express 427 in binary you’d get 110101011.

Base85 breaks bits into 32-bit words, then expresses each word in base 85. So you might say it’s base 85-encoded 32-bit words by analogy to binary coded decimal.

There are variations on Base85 encoding that use different alphabets, and so two software packages that say they do Base85 encoding might produce different results.

Base85 is more efficient than Base58 in the sense that it represents data using fewer symbols. It is also more computationally efficient because each 32-bit word is encoded independently.

Examples

We give four examples below: Base58 and Base85 applied to four bytes of data and eight bytes of data. The data length matters for Base85.

Base58, four bytes

Let n = CAFEBABEhex = 3405691582ten. This is the “magic number” at the beginning of Java class files, a pun on “java” as a slang for coffee.

In base 58 this number would be

5:10:55:3:26:22

We can verify this as follows:

    >>> 5*58**5 + 10*58**4 + 55*58**3 + 3*58**2 + 26*58 + 22
    3405691582
    >>>  hex(_)
    '0xcafebabe'

The Base58 alphabet is

    123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz

and so the Base58 encoding of 0xCAFEBABE would be the 5th, 10th, 55th, … elements of this alphabet (with zero-based index) which results in 6Bx4TP.

Note that the Base58 alphabet contains the digit 1 but not the letter l. It contains the lower case letter o but not the capital letter 0 or the digit 0. Some of the remaining characters are visibly similar, depending on your font. This post shows how one font makes the Base58 characters more distinct.

Base85, four bytes

Now suppose we want to encode n using Base85. Now we would get

65:20:50:84:67

If we use the alphabet

    !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstu

then the “digits” above become b5Sud.

Note that the Base85 alphabet contains characters that could be confused, such as 0 (zero), O (capital letter), o (lower case letter). The characters were chosen to be printable ASCII characters, not necessarily visually distinct.

Base58, eight bytes

Now suppose n = CAFEBABECAFEBABEhex = 14627333968358193854ten.

We convert n to base 58 to get

33:55:17:43:49:44:3:47:49:44:26

which becomes axJkrm4prmT in the Base58 alphabet.

Base85, eight bytes

To encode CAFEBABECAFEBABEhex in Base85 we do not convert the number to base 85. Instead, we convert each 4-byte word to base 85. So we get two copies of CAFEBABEhex and so the encoding is b5Sudb5Sud.

If we were to wrongly convert n to base 85, we’d get

63:13:1:27:77:35:57:62:38:49

which becomes `."<nDZ_GR which is not the correct encoding.

Related posts

Golden ratio base numbers

It is possible to express every positive integer as a sum of powers of the golden ratio φ using each power at most once. This means it is possible to create a binary-like number system using φ as the base with coefficients of 0 and 1 in front of each power of φ.

This system is sometimes called phinary because of the analogy with binary. I’ll use that term here rather than more formal names such as base-φ or golden base number system.

An interesting feature of phinary is that in general you need to include negative powers of φ to represent positive integers. For example,

2 = \varphi + \varphi^{-2}

and so you could write 2 in this system as 10.01.

To state things more formally, every positive integer n satisfies the following equation where a finite number of coefficients coefficients ak are equal to 1 and the rest are equal to 0.

n = \sum_{k=-\infty}^\infty a_k\varphi^k

The golden ratio satisfies φ² = φ + 1 and so phinary representations are not unique. But if you add the rule that number representations must not have consecutive 1s, then representations are unique, analogous to the Fibonacci number system.

The original paper describing the phinary system [1] is awkwardly written. It has the flavor of “Here are some examples. You can see how this generalizes.” rather than a more typical mathematical style.

The end of the article says “Jr. High School 246 Brooklyn, N.Y.” and so when I got to that point I thought the style was due to the paper having been written by a public school teacher rather than a professional mathematician. I later learned from [2] that the author was not a math teacher but a student: George Bergman was 12 years old when he discovered and published his number system.

Phinary is not as simple to develop as you might expect. Bergman’s discovery was impressive, and not only because he was 12 years old at the time. You can find more sophisticated developments in [2] and in [3], but both require a few preliminaries and are not simple.

***

[1] George Bergman. A Number System with an Irrational Base. Mathematics Magazine31 (2): 98–110. 1957.

[2] Cecil Rousseau. The Phi Number System Revisited. Mathematics Magazine, Vol. 68, No. 4 (Oct., 1995), pp. 283-284

[3] Donald Knuth. The Art of Computer Programming, volume 1.

Binomial number system

I just stumbled across the binomial number system in Exercise 5.38 of Concrete Mathematics. The exercise asks the reader to show that every non-negative integer n can be written as

n = \binom{a}{1} + \binom{b}{2} + \binom{c}{3}

and that the representation is unique if you require 0 ≤ abc. The book calls this the binomial number system. I skimmed a paper that said this has some application in signal processing, but I haven’t looked at it closely [1].

You can find ab, and c much as you would find the representation in many other number systems: first find the largest possible c, then the largest possible b for what’s left, and then the remainder is a.

In order to find c, we start with the observation that the binomial coefficient C(k, 3) is less than k³/6 and so c is less than the cube root of 6n. We can use this as an initial lower bound on c then search incrementally. If we wanted to be more efficient, we could do some sort of binary search.

Here’s Python code to find ab, and c.

from math import comb, factorial

def lower(n, r):
    "Find largest k such that comb(k, r) <= n."
    k = int( (factorial(r)*n)**(1/r) ) # initial guess
    while comb(k, r) <= n: 
        k += 1 
    return k - 1 

def binomial_rep(n): 
    c = lower(n, 3) 
    cc = comb(c, 3) 
    b = lower(n - comb(c, 3), 2) 
    bb = comb(b, 2) 
    a = n - cc - bb 
    assert(c > b > a >= 0)
    return (a, b, c)

For example, here’s the binomial number system representation of today’s date.

>>> binomial_rep(20250605)
(79, 269, 496)
>>> comb(496, 3) + comb(269, 2) + comb(79, 1)
20250605

You could use any number of binomial terms, not just three.

[1] I looked back at the paper, and it is using a different kind of binomial number system, expressing numbers as sums of fixed binomial coefficients, not varying the binomial coefficient arguments. This representation has some advantages for error correction.

Representing octonions as matrices, sorta

It’s possible to represent complex numbers as a pair of real numbers or 2 × 2 matrices with real entries.

z \leftrightarrow (a, b) \leftrightarrow \begin{bmatrix}a & -b \\ b & a \end{bmatrix}

And it’s possible to represent quaternions as pairs of complex numbers or 2 × 2 matrices with complex entries

q \leftrightarrow (z_0, z_1) \leftrightarrow \begin{bmatrix} z_0 & z_1 \\ -z_1^* & z_0^* \end{bmatrix}

were z* is the complex conjugate of z.

And it’s also possible to represent octonions as pairs of quaternions or 2 × 2 matrices with quaternion entries, with a twist.

o \leftrightarrow (q_0, q_1) \leftrightarrow \begin{bmatrix} q_0 & q_1 \\ -q_1^* & q_0^* \end{bmatrix}

where q* is the quaternion conjugate of q.

Matrix multiplication is associative, but octonion multiplication is not, so something has to give. We have to change the definition of matrix multiplication slightly.

\begin{bmatrix} \alpha_0 & \alpha_1 \\ \alpha_2 & \alpha_3 \end{bmatrix}\circ\begin{bmatrix} \beta_0 & \beta_1 \\ \beta_2 & \beta_3 \end{bmatrix}=\begin{bmatrix} \alpha_0\beta_0+\beta_2\alpha_1 & \beta_1\alpha_0+\alpha_1\beta_3\\ \beta_0\alpha_2+\alpha_3\beta_2 & \alpha_2\beta_1+\alpha_3\beta_3 \end{bmatrix}

In half the products, the beta term comes before the alpha term. This wouldn’t matter if the alpha and beta terms commuted, e.g. if they were complex numbers this would be ordinary matrix multiplication. But the alphas and betas are quaternions, and so order matters, and the matrix product defined above is not the standard matrix product.

Going back to the idea of matrices of matrices that I wrote about a few days ago, we could represent the octonions as 2 × 2 matrices whose entries are 2 × 2 matrices of complex numbers, etc.

If you look closely at the matrix representations above, you’ll notice that the matrix representations of quaternions and octonions doesn’t quite match the pattern of the complex numbers. There should be a minus sign in the top right corner and not in the bottom left corner. You could do it that way, but there’s a sort of clash of conventions going on here.

Octonions sometimes associate

You can multiply pairs of real numbers using the rules of complex numbers. Complex numbers have all the algebraic structure of the real numbers, i.e. they form a field.

There is a general process, the Cayley-Dickson construction, that let’s you bootstrap multiplication from 1 real number to 2, from 2 to 4, from 4 to 8, etc. You can repeat the process as many times as you like, defining multiplication on lists of 2n numbers, but you lose structure as you go.

Quaternions

Multiplication for 4-tuples gives the quaternions. The quaternions retain most of the structure of the real and complex numbers. Multiplication is associative. Non-zero elements have a multiplicative inverse, i.e. you can divide. And multiplication plays well with the norm:

|| xy || = || x || · || y ||.

But multiplication is not commutative: in general, xy ≠ yx,

Octonions

Multiplication of 8-tuples produces the octonions . It’s still true that non-zero elements have a multiplicative inverse, and multiplication still plays well with the norm as above. But now, not only is multiplication not commutative, it’s not even associative: in general, (xy)z ≠ x(yz). It’s the “in general” part that this post wants to elaborate on.

The subalgebra generated by any two elements is associative. That means, for example, that (xy)xx(yx). If you fix x and y, and look at all the octonions you can form by taking adding, multiplying, conjugating, and inverting these elements, as well as multiplying them by a real number, you get a set of octonions for which multiplication is associative.

In fact, the subalgebra generated by two octonions is isomorphic to either the real numbers, the complex numbers, or the quaternions, depending on the two octonions you start with.

This was brought to my attention by a common on a post on octonions from a few years ago. Someone pointed out that an equation I had written

x* = − (x + (e1 xe1 + … + (e7 xe7) / 6

could be written more simply as

x* = − (x + e1 x e1 + … + e7 x e7) / 6.

because each term only involves two distinct octonions.

Sedenions

The next step, multiplying 16-tuples of real numbers, gives the sedenions [1]. Now we lose even more structure. Multiplication is not commutative, not associative, and it’s possible for two non-zero numbers to have a zero product. That means the norm property

|| xy || = || x || · || y ||

goes out the window since the left size can be zero when the right side is not.

Sedenions, and indeed all Cayley-Dickson algebras, are flexible, which means (xy)xx(yx). But it’s not true more generally that the algebra generated by two sedenions is associative.

Trigintaduonions

The next rung in the Cayley-Dickson ladder is the family of 32-tuples known as the trigintaduonions [2]. The sedenions are a mess, and they’re a subset of the trigintaduonions, so the trigintaduonions are a mess. But at least they’re flexible.

Summary of properties

More octonion posts

[1] From the Latin word sedecim for 16.

[2] From the Latin triginta for 30 and duo for 2.

Multiplying by quaternions on the left and right

The map that takes a quaternion x to the quaternion qx is linear, so it can be represented as multiplication by a matrix. The same is true of the map that takes x to xq, but the two matrices are not the same because quaternion multiplication does not commute.

Let qa + bi + cjdk and let qM be the matrix that represents multiplication on the left by q. Then

_qM = \begin{bmatrix} a & -b & -c & -d \\ b & a & -d & c \\ c & d & a & -b \\ d & -c & b & a \\ \end{bmatrix}

Now let Mq be the matrix that represents multiplication on the right by q. Then

M_q = \begin{bmatrix} a & -b & -c & -d \\ b & a & d & -c \\ c & -d & a & b \\ d & c & -b & a \\ \end{bmatrix}

Can prove both matrix representations are correct by showing that they do the right thing when q = 1, ij, and k. The rest follows by linearity.

You might speculate that the matrix representation for multiplying on the right by q might be the transpose of the matrix representation for multiplying on the left by q. You can look at the matrices above and see that’s not the case.

In this post I talk about how to represent rotations with quaternions, and in this post I give an equation for the equivalent rotation matrix for a rotation described by a quaternion. You can prove that the matrix representation is correct by multiplying out qM and Mq* . Keep in mind that q in that case is a unit quaterion, so the sum of the squares of its components add to 1.

Related posts

Matrix representations of number systems

The previous post discussed complex numbers, dual numbers, and double numbers. All three systems are constructed by adding some element to the real numbers that has some special algebraic property. The complex numbers are constructed by adding an element i such that i² = −1. The dual numbers add an element ε ≠ 0 with ε² = 0, and the double numbers are constructed by adding j ≠ 1 with j² = 1.

If adding special elements seems somehow illegitimate, there is an alternative way to define these number systems that may seem more concrete using 2 × 2 matrices. (A reader from 150 years ago would probably be more comfortable with appending special numbers than with matrices, but now we’re accustomed to matrices.)

The following mappings provide isomorphisms between complex, dual, and double numbers and their embeddings in the ring of 2 × 2 matrices.

\begin{align*} a + ib &\leftrightarrow \begin{pmatrix} a & -b \\ b & a \end{pmatrix} \\ a + \varepsilon b &\leftrightarrow \begin{pmatrix} a & b \\ 0 & a \end{pmatrix} \\ a + jb &\leftrightarrow \begin{pmatrix} a & b \\ b & a \end{pmatrix} \\ \end{align*}

Because the mappings are isomorphisms, you can translate a calculation in one of these number systems into a calculation involving real matrices, then translate the result back to the original number system. This is conceptually interesting, but it could also be useful if you’re using software that supports matrices but does not directly support alternative number systems.

You can also apply the correspondences from right to left. If you need to carry out calculations on matrices of the special forms above, you could move over to complex (or dual, or double) numbers, do your algebra, then convert the result back to matrices.

Functions of a matrix

The previous post looked at variations on Euler’s theorem in complex, dual, and double numbers. You could verify these three theorems by applying exp, sin, cos, sinh, and cosh to matrices. In each case you define the function in terms of its power series and stick in matrices. You should be a little concerned about convergence, but it all works out.

You should also be concerned about commutativity. Multiplication of real numbers is commutative, but multiplication of matrices is not, so you can’t just stick matrices into any equation derived for real numbers and expect it to hold. For example, it’s not true in general that exp(A + B) equals exp(A) exp(B). But it is true if the matrices A and B commute, and the special matrices that represent complex (or dual, or double) numbers do commute.

Related posts

Euler’s formula for dual numbers and double numbers

The complex numbers are formed by adding an element i to the real numbers such that i² = − 1. We can create other number systems by adding other elements to the reals.

One example is dual numbers. Here we add a number ε ≠ 0 with the property ε² = 0. Dual numbers have been used in numerous applications, most recently in automatic differentiation.

Another example is double numbers [1]. Here we add a number j ≠ ±1 such that j² = 1. (Apologies to electrical engineers and Python programmers. For this post, j is not the imaginary unit from complex numbers.)

(If adding special numbers to the reals makes you uneasy, see the next post for an alternative approach to defining these numbers.)

We can find analogs of Euler’s formula

\exp(i\theta) = \cos(\theta) + i \sin(\theta)

for dual numbers and double numbers by using the power series for the exponential function

\exp(z) = \sum_{k=0}^\infty \frac{z^k}{k!}

to define exp(z) in these number systems.

For dual numbers, the analog of Euler’s theorem is

\exp(\varepsilon x) = 1 + \varepsilon x

because all the terms in the power series after the first two involve powers of ε that evaluate to 0. Although this equation only holds for dual numbers, not real numbers, it is approximately true of ε is a small real number. This is the motivation for using ε as the symbol for the special number added to the reals: Dual numbers can formalize calculations over the reals that are not formally correct.

For double numbers, the analog of Euler’s theorem is

\exp(j x) = \cosh(x) + j \sinh(x)

and the proof is entirely analogous to the proof of Euler’s theorem for complex numbers: Write out the power series, then separate the terms involving even exponents from the terms involving odd exponents.

Related posts

[1] Double numbers have also been called motors, hyperbolic numbers, split-complex numbers, spacetime numbers, …

A magical land where rounding equals truncation

Rounding numbers has a surprising amount of detail. It may seem trivial but, as with most things, there is a lot more to consider than is immediately obvious. I expect there have been hundreds if not thousands of pages devoted to rounding in IEEE journals.

An example of the complexity of rounding is what William Kahan called The Tablemaker’s Dilemma: there is no way in general to know in advance how accurately you’ll need to compute a number in order to round it correctly.

Rounding can be subtle in any number system, but there is an alternative number system in which it is a little simpler than in base 10. It’s base 3, but with a twist. Instead of using 0, 1, and 2 as “digits”, we use −1, 0, and 1. This is known as the balanced ternary system: ternary because of base 3, and balanced because the digits are symmetrical about 0.

We need a symbol for −1. A common and convenient choice is to use T. Think of moving the minus sign from in front of a 1 to on top of it. Now we could denote the number of hours in a day as 10T0 because

1 \times 3^3 + 0 \times 3^2 + (-1)\times 3 + 0 = 24

A more formal way of a describing balanced ternary representation of a number x is a set of coefficients tk such that

x = \sum_{k=-\infty}^\infty t_k 3^k

with the restriction that each tk is in the set {−1, 0, 1}.

Balanced ternary representation has many interesting properties. For example, positive and negative numbers can all be represented without a minus sign. See, for example, Brain Hayes’ excellent article Third Base. The property we’re interested in here is that to round a balanced ternary number to the nearest integer, you simply lop off the fractional part. Rounding is the same as truncation. To see this, note that the largest possible fractional part is a sequence of all 1s, which represents ½:

\frac{1}{3} + \frac{1}{3^2} + \frac{1}{3^3} + \cdots = \frac{1}{2}

Similarly, the most negative possible fractional part is a sequence of all Ts, which represents −½. So unless the fractional part is exactly equal to ½, truncating the fractional part rounds to the nearest integer. If the fractional part is exactly ½ then there is no nearest integer but two integers that are equally near.

Related posts

Non-associative multiplication

There are five ways to parenthesize a product of four things:

  • ((ab)c)d
  • (ab)(cd)
  • (a(b(cd))
  • (a(bc))d
  • (a((bc)d)

In a context where multiplication is not associative, the five products above are not necessarily the same. Maybe all five are different.

This post will give two examples where the products above are all different: octonions and matrix multiplication.

If you’re thinking “But wait: matrix multiplication is associative!” then read further and see in what sense I’m saying it’s not.

Octonioins

Octonion multiplication is not associative. I wrote a blog post a while back asking how close the octonions come to being associative. That is, if we randomly generate unit-length octonions a, b, and c, we can calculate the norm of

(ab)ca(bc)

and ask about its expected value. Sometimes for a triple of octonions this value is zero, but on average this expression has norm greater than 1. I estimated the average value via simulation, and later Greg Egan worked out the exact value. His write-up had gone down with the Google+ ship, but recently Greg posted a new version of his notes.

In this post I gave Python code for multiplying octonions using a function called CayleyDickson named after the originators of the algorithm. Let’s rename it m to have something shorter to work with and compute all five ways of associating a product of four octonions.

    import numpy as np

    def m(x, y):
        return CayleyDickson(x, y)

    def take_five(a, b, c, d):
        return [
            m(a, (m(b, m(c, d)))),
            m(m(a, b), m(c, d)),
            m(m(m(a, b), c), d),
            m(a, m(m(b, c), d)),
            m(m(a, m(b, c)), d)
    ]

I first tried products of basis elements, and I only got two different products out of the five ways of associating multiplication, but I only tried a few examples. However, when I tried using four random octonions, I got five different products.

    np.random.seed(20220201)

    a = np.random.rand(8)
    b = np.random.rand(8)
    c = np.random.rand(8)
    d = np.random.rand(8)

    for t in take_five(a, b, c, d):
        print(t)

This gave a very interesting result: I got five different results, but only two different real (first) components. The five vectors all differed in the last seven components, but only produced two distinct first components.

    [ 2.5180856  -2.61618184  ...]
    [ 2.5180856   0.32031027  ...]
    [ 2.5180856   1.13177500  ...]
    [ 3.0280984  -0.30169446  ...]
    [ 3.0280984  -2.36523580  ...]

I repeated this experiment a few times, and the first three results always had the same real component, and the last two results had another real component.

I suspect there’s a theorem that says

Re( ((ab)c)d ) = Re( (ab)(cd) ) = Re( a(b(cd)) )

and

Re( (a(bc))d ) = Re( a((bc)d) )

but I haven’t tried to prove it. If you come with a proof, or a counterexample, please post a link in the comments.

Matrix multiplication

Matrix multiplication is indeed associative, but the efficiency of matrix multiplication is not. That is, any two ways of parenthesizing a matrix product will give the same final matrix, but the cost of the various products are not the same. I first wrote about this here.

This is a very important result in practice. Changing the parentheses in a matrix product can make the difference between a computation being practical or impractical. This comes up, for example, in automatic differentiation and in backpropagation for neural networks.

Suppose A is an m by n matrix and B is an n by p matrix. Then AB is an m by p matrix, and forming the product AB requires mnp scalar multiplications. If C is a by q matrix, then (AB)C takes

mnp + mpq = mp(n + q)

scalar multiplications, but computing A(BC) takes

npqmnq = nq(m + p)

scalar multiplications, and in general these are not equal.

Let’s rewrite our multiplication function m and our take_five function to compute the cost of multiplying four matrices of random size.

We’ve got an interesting programming problem in that our multiplication function needs to do two different things. First of all, we need to know the size of the resulting matrix. But we also want to keep track of the number of scalar multiplications the product would require. We have a sort of main channel and a side channel. Having our multiplication function return both the dimension and the cost would make composition awkward.

This is is kind of a fork in the road. There are two ways to solving this problem, one high-status and one low-status. The high-status approach would be to use a monad. The low-status approach would be to use a global variable. I’m going to take the low road and use a global variable. What’s one little global variable among friends?

    mults = 0

    def M(x, y):
        global mults
        mults += x[0]*x[1]*y[0]
        return (x[0], y[1])

    def take_five2(a, b, c, d):

        global mults
        costs = []

        mults = 0; M(a, (M(b, M(c, d)))); costs.append(mults)
        mults = 0; M(M(a, b), M(c, d));   costs.append(mults)
        mults = 0; M(M(M(a, b), c), d);   costs.append(mults)
        mults = 0; M(a, M(M(b, c), d));   costs.append(mults)
        mults = 0; M(M(a, M(b, c)), d);   costs.append(mults)

        return costs 

Next, I’ll generate five random integers, and group them in pairs as sizes of matrices conformable for multiplication.

    dims = np.random.random_integers(10, size=5)
    dim_pairs = zip(dims[:4], dims[1:])
    c = take_five2(*[p for p in dim_pairs])

When I ran this dims was set to

[3 9 7 10 6]

and so my matrices were of size 3×9, 9×7, 7×10, and 10×6.

The number of scalar multiplications required by each way of multiplying the four matrices, computed by take_five2 was

[1384, 1090, 690, 1584, 984]

So each took a different number of operations. The slowest approach would take more than twice as long as the fastest approach. In applications, matrices can be very long and skinny, or very wide and thin, in which case one approach may take orders of magnitude more operations than another.

Related posts