Quaternion product as a matrix product

Pick a quaternion

p = p0 + p1i + p2j + p3k

and consider the function that acts on quaternions by multiplying them on the left by p.

If we think of q as a vector in R4 then this is a linear function of q, and so it can be represented by multiplication by a 4 × 4 matrix Mp.

It turns out

M_p = \begin{bmatrix} p_0 & {-}p_1 & {-}p_2 & {-}p_3 \\ p_1 & \phantom{-}p_0 & {-}p_3 & \phantom{-}p_2 \\ p_2 & \phantom{-}p_3 & \phantom{-}p_0 & {-}p_1 \\ p_3 & {-}p_2 & \phantom{-}p_1 & \phantom{-}p_0 \\ \end{bmatrix}

How might you remember or derive this matrix? Consider the matrix on the left below. It’s easier to see the pattern here than in Mp.

\begin{pmatrix} 1/1 & 1/i & 1/j & 1/k \\ i/1 & i/i & i/j & i/k \\ j/1 & j/i & j/j & j/k \\ k/1 & k/i & k/j & k/k \\ \end{pmatrix} = \begin{bmatrix} 1 & {-}i & {-}j & {-}k \\ i & \phantom{-}1 & {-}k & \phantom{-}j \\ j & \phantom{-}k & \phantom{-}1 & {-}i \\ k & {-}j & \phantom{-}i & \phantom{-}1 \\ \end{bmatrix}

You can derive Mp from this matrix.

Let’s look at the second row, for example. The second row of Mp, when multiplied by q as a column vector, produces the i component of the product.

How do you get an i term in the product? By multiplying the i component of p by the real component of q, or by multiplying the real component of p times the i component of p, or by multiplying the i/ j component of p by the j component of q, or by multiplying the i/k component of p by the k component of q.

The other rows follow the same pattern. To get the x component of the product, you add up the products of the x/y term of p and the y term of q. Here x and y range over

{1, i, j, k}.

To get Mp from the matrix on the right, replace 1 with the real component of p, replace i with the i component of p, etc.

As a final note, notice that the off-diagonal elements of Mp are anti-symmetric:

mij = –mji

unless i = j.

Quaternion products with fewer real products

A couple days ago I wrote about Gauss’ trick to multiply two complex numbers using only 3 real multiplications. This post will do something similar with quaternions.

Just as the most direct approach to computing complex products requires 4 real multiplications, the most direct approach to quaternion products requires 16 real multiplications. (See the function mult here.)

We can represent a quaternion as a pair of complex numbers by observing

a + bi + cj + dk = (a + bi) + (c + di)j

Gauss’ trick can multiply two complex numbers using only three real multiplications. It would seem that we could use something analogous to multiply two quaternions using only 3 complex multiplications, then apply Gauss’ trick to do each of these complex multiplications with 3 real multiplications. The net result would be computing a quaternion product in only 9 real multiplications.

Except that doesn’t work. A direct application of Gauss’ trick doesn’t work because i and j don’t commute. Perhaps there’s some variation on Gauss’ trick that would work, but I haven’t found one.

So let’s back up and take a different approach.

You can represent the quaternions as 2 × 2 complex matrices via

a + bi + cj + dk \mapsto \begin{bmatrix}a - di & -c - bi \\ c - bi & \phantom{-}a + di\end{bmatrix}

So we could multiply two quaternions by multiplying two 2 × 2 complex matrices. That would require 8 complex multiplications [1], which could be carried out using 3 real multiplications each. This would let us “reduce” the number of multiplications from 16 to 24.

But by symmetry we could eliminate half the multiplications. This puts us at 12 real multiplications, which is less than the 16 required by the most direct approach.

To see this, let α = adi and β = c + bi. Then our matrix representation has the form

\begin{pmatrix} \alpha & -\beta \\ \bar{\beta} & \bar{\alpha} \end{pmatrix}

Then our quaternion product becomes the matrix product

\begin{pmatrix} \alpha & -\beta \\ \bar{\beta} & \bar{\alpha} \end{pmatrix} \begin{pmatrix} \gamma & -\delta \\ \bar{\delta} & \bar{\gamma} \end{pmatrix} = \begin{pmatrix} \alpha\gamma - \beta\bar{\delta} & -\alpha\delta - \beta\bar{\gamma}\\ \bar{\beta}\gamma + \bar{\alpha}\bar{\delta} & -\bar{\beta}\delta + \bar{\alpha}\bar{\gamma} \end{pmatrix}

We only need to compute the complex products in top row; we can recover the quaternion representation of the product from there. The products in the second row are negatives or conjugates of products in the first row.

Maybe there’s a Gauss-like trick to compute the top row of the matrix using only three complex products, but it can at least be done using four complex products.

Related posts

[1] You could use Strassen’s approach to multiply two 2 × 2 matrices in 7 multiplications. That would cut the number of real multiplications down to 21, but exploiting symmetry cuts the number of multiplications further.

Faster quaternion product rotations

You can use quaternions to describe rotations and quaternion products to carry out these rotations. These products have the form


where q represents a rotation, q* is its conjugate, and p is the the vector being rotated. (That’s leaving out some details that we’ll get to shortly.)

The primary advantage of using quaternions to represent rotations is that it avoids gimbal lock, a kind of coordinate singularity.

Saving multiplications

If you multiply quaternions directly from the definitions, the product takes 16 real multiplications. (See the function mult in the Python code below.) So a naive implication of the product above, with two quaternion multiplications, would require 32 real multiplications.

You can do something analogous to Gauss’s trick for complex multiplication to do each quaternion product with fewer multiplications [1, 2], but there’s an even better way. It’s possible to compute the rotation in 15 multiplications [3].


Before we can say what the faster algorithm is, we have to fill in some details we left out at the top of the post. We’re rotating a vector in 3 dimensions using quaternions that have 4 dimensions. We do that by embedding our vector in the quaternions, carrying out the product above, and then pulling the rotated vector out.

Specifically, let

v = (v1, v2, v3)

be the vector we want to rotate. We turn v into a quaternion by defining

p = (o, v1, v2, v3).

We encode our rotation in the unit quaternion

q = (q0, q1, q2, q3)

where q0 = cos(θ/2) and (q1, q2, q3) is the axis we want to rotate around and θ is the amount we rotate.

The rotated vector is the vector part of


i.e. the vector formed by chopping off the first component of the quaternion product.



t = 2 (q1, q2, q3) × (v1, v2, v3) = (t1, t2, t3)

where × is cross product. Then the rotated vector is

v‘ = (v1, v2, v3) + q0(t1, t2, t3) + (q1, q2, q3) × (t1, t2, t3).

The algorithm involves two cross products, with 6 real multiplications each, and one scalar-vector product requiring 3 real multiplications, for a total of 15 multiplications. (I’m not counting the multiplication by 2 as a multiplications because it could be done by an addition or by a bit-shift.)


Here’s some Python code to carry out the naive product and the faster algorithm.

    import numpy as np

    def mult(x, y):
        return np.array([
            x[0]*y[0] - x[1]*y[1] - x[2]*y[2] - x[3]*y[3],
            x[0]*y[1] + x[1]*y[0] + x[2]*y[3] - x[3]*y[2],
            x[0]*y[2] - x[1]*y[3] + x[2]*y[0] + x[3]*y[1],
            x[0]*y[3] + x[1]*y[2] - x[2]*y[1] + x[3]*y[0]

    def cross(x, y):
        return np.array( (x[1]*y[2] - x[2]*y[1], x[2]*y[0] - x[0]*y[2], x[0]*y[1] - x[1]*y[0]) )

    def slow_rot(q, v):
        qbar = -q
        qbar[0] = q[0]
        return mult(mult(q, v), qbar)[1:]

    def fast_rot(q, v):
        t = 2*cross(q[1:], v[1:])
        return v[1:] + q[0]*t + cross(q[1:], t)

And here’s some test code.

    def random_quaternion():
        x = np.random.normal(size=4)
        return x 

    for n in range(10):
        v = random_quaternion()
        v[0] = 0
        q = random_quaternion()
        q /= np.linalg.norm(q)
        vprime_slow = slow_rot(q, v)
        vprime_fast = fast_rot(q, v)
        print(vprime_fast - vprime_slow)

This prints 10 vectors that are essentially zero, indicating that vprime_slow and vprime_fast produced the same result.

It’s possible to compute the rotation in less time than two general quaternion products because we had three things we could exploit.

  1. The quaternions q and q* are very similar.
  2. The first component of p is zero.
  3. We don’t need the first component of qpq*, only the last three components.

Related posts

[1] I know you can multiply quaternions using 12 real multiplications, and I suspect you can get it down to 9. See this post.

[2] Note that I said “fewer multiplications” rather than “faster.” Gauss’ method eliminates a multiplication at the expense of 3 additions. Whether the re-arranged calculation is faster depends on the ratio of time it takes for a multiply and an add, which depends on hardware.

However, the rotation algorithm given here reduces multiplications and additions, and so it should be faster on most hardware.

[3] I found this algorithm here. The author cites two links, both of which are dead. I imagine the algorithm is well known in computer graphics.

Quaternion square roots

If y is a quaternion, how many solutions does x² = y have? That is, does every quaternion have a square root, and if so, how many square roots does it have?

A quaternion can have more than two roots. There is a example right in the definition of quaternions:

i² = j² = k² = -1

and so -1 has at least three square roots. It follows that every negative real number has at least three square roots.

Negative reals

In fact, every square root of -1 corresponds to a square root of any other negative number. To see this, let p be a positive real number and let r = √p > 0. If x is a square root of -1, then rx is a square root of –p. Conversely, if x is a square root of –p, then x/r is a square root of -1.

So all negative real numbers have the same number of square roots in the quaternions, and that number is at least 3. In fact, it turns out that all negative real numbers have infinitely many quaternion roots. A calculation shows that

(a + bi + cj + dk)² = -1

if and only if b² + c² + d² = 1. So there are as many square roots of -1 as there are points on a sphere.

Not negative reals

If a quaternion is not a negative real number, then the story is much more familiar: the equation x² = y has one solution if y = 0 and two solutions otherwise.

To summarize, x² = y has

  • one solution if y = 0,
  • infinitely many solutions if y is negative and real, and
  • two solutions otherwise.

It’s easy to show that 0 has only one root: |x²| = |x|², and so if x² = 0, then |x| = 0 and so x = 0. Here the norm of a quaternion is defined by

|a + bi + cj + dk| = √( + + + ).

If p > 0, the solutions to x² = –p are bi + cj + dk where b² + c² + d² = p.

What’s left is to show that if y is not zero and not a negative real, then x² = y has two solutions.

Polar form and calculating roots

Quaternions have a polar form analogous to the polar form for complex numbers. That is, we can write a quaternion y in the form

y = r(cos θ + u sin θ)

where r is real and positive, θ is real, and u is a unit quaternion with zero real part. We can find r, θ, and u as follows. Write y as

a + bi + cj + dk = a + q

so that a is the real part and q is the pure quaternion part. Then we can find r, θ, and u by

r = |y|
cos θ = a/|y|
u = q / |q|

Then the two roots are

x = ±√r (cos θ/2 + u sin θ/2).

Python code

First we’ll implement the algorithm above, then show that it produces the same results as the Python package quaternionic.

First, our code to compute x satisfying x² = y.

    import numpy as np


    def norm(q):
        return sum(t*t for t in q)**0.5

    y = np.random.random(size=4)
    r = norm(y)
    theta = np.arccos(y[0]/r)

    u = y + 0 # make a copy
    u[0] = 0
    u /= norm(u)

    x = np.sin(theta/2)*u
    x[0] = np.cos(theta/2)
    x *= r**0.5

And now we check our results using quaternionic.

    import quaternionic

    qx = quaternionic.array(x)
    qy = quaternionic.array(y)

    print(f"y   = {y}")
    print(f"x*x = {qx*qx}")       # should equal y
    print(f"x   = {x}")           # our root
    print(f"x   = {np.sqrt(qy)}") # root via package

This produces

    y   = [0.61615367 0.07612092 0.09606777 0.11150865]
    x*x = [0.61615367 0.07612092 0.09606777 0.11150865]
    x   = [0.79189641 0.04806243 0.06065678 0.07040609]
    x   = [0.79189641 0.04806243 0.06065678 0.07040609]

More quaternion posts

For an exploration of the roots of general polynomial equations over polynomials, see Equations in Quaternions by Ivan Niven. The American Mathematical Monthly, Dec., 1941, Vol. 48, pp. 654-661.

Advantages of redundant coordinates

Since you can describe a point in the plane with two numbers, why would you choose to use three numbers? Why would you ever want to use a coordinate system with more coordinates than necessary?

Barycentric coordinates

One way to indicate the location of a point inside a triangle is to give the distance to each of the vertices. These three distances are called barycentric coordinates. Why would you use three numbers when two would do?

Barycentric coordinates make some things much simpler. For example, the coordinates of the three vertices are (1, 0, 0), (0, 1, 0), and (0, 0, 1) for any triangle. The points inside are written as convex combinations of the vertices. The coordinates of the center of mass, the barycenter, are (1/3, 1/3, 1/3). The vertices are treated symmetrically, even if the triangle is far from symmetric.

Barycentric coordinates are useful in applications, such as computer graphics and finite element analysis, because they are relative coordinates. When a triangle moves or is rescaled, you only need to keep track of where the vertices went; the coordinates of the points inside relative to the vertices haven’t changed.

This can be generalized to more dimensions. For example, you could describe a point in a tetrahedron with four coordinates, more in higher dimensions you could describe a point in an n-simplex by the convex combination coefficients of the n vertices.

Barycentric coordinates are related to Dirichlet probability distributions. When you have n probabilities that sum to 1, you’ve got n-1 degrees of freedom. But it often simplifies things to work with n variables. As with the discussion of triangles above, the extra variable makes expressions more symmetric.

Quaternions and rotations

A point in three dimensional space can be described with three numbers, but it’s often useful to think of the usual three coordinates as the vector part of a quaternion, a set of four numbers.

Suppose you have a point

a = (x, y, z)

and you want to rotate it by an angle θ around an axis given by a unit vector

b = (u, v, w).

You can compute the rotation by associating the point a with the quaternion

p = (0, x, y, z)

and the axis b with the quaternion

q = (cos(θ/2), sin(θ/2) u, sin(θ/2) v, sin(θ/2) w)

The image of a is then given by the quaternion

q p q-1.

This quaternion will have zero real part, and so the Euclidean coordinates are given by the vector part, the last three components.

Of course the product above is a quaternion product, which is not commutative. That’s why the q and the q-1 don’t cancel out.

Using quaternions for rotations has several advantages over using rotation matrices. First, the quaternion representation is more compact, describing a rotation with four real numbers rather than nine. Second, the quaternion calculation can be better behaved numerically. But most importantly, the quaternion approach avoids gimbal lock, a sort of singularity in representing rotations.

Projective planes

In applications of algebra, such as elliptic curve cryptography, you often need to add “points at infinity” to make things work out. To formalize this, you add an extra coordinate. So while an elliptic curve is usually presented as an equation such as

y² = x³ + ax + b,

it’s more formally an equation in three variables

y²z = x³ + axz² + bz³.

Points in the projective plane have coordinates (x, y, z) where points are considered equivalent if they differ by a non-zero multiple, i.e. (x, y, z) is considered the same point as (λx, λy, λz) for any non-zero λ.

You can often ignore the z, choosing λ so that the z coordinate is 1. But when you need to work with the point at infinity in a uniform way, you bring out the full coordinates. Now the “point at infinity” is not some mysterious entity, but simply the point (0, 1, 0).

Common themes

Projective coordinates, like barycentric coordinates, introduce symmetry. With the addition of an extra coordinate, the three coordinates all behave similarly, with no reason to distinguish any coordinate as special. And as with quanternion rotations, projective coordinates make singularities go away, which is consequence of symmetry.

Related posts

Quaternion reference in the Vulgate

To contemporary ears “quaternion” refers to a number system discovered in the 19h century, but there were a couple precedents. Both refer to something related to a group of four things, but there is no relation to mathematical quaternions other than that they have four dimensions.

I’ve written before about Milton’s use of the word in Paradise Lost. Milton is alluding to the four elements of antiquity: air, earth, fire, and water.

I recently found out the word quaternion appears in the Latin Vulgate as well. Acts 12:4 records that Herod had Peter guarded by four squads of four soldiers each, “quattuor quaternionibus militum.”

Update: Thanks to Phil for pointing out in the comments that quaternion appears in the King James as well, “four quaternions of soldiers.”

More quaternion posts

Redoing division algebra graphs with angular distance

The blog post that kicked off the recent series of posts looked at how far apart xy and yx are for quaternions. There I used the Euclidean distance, i.e. || xy – yx ||. This time I’ll look at the angle between xy and yx, and I’ll make some analogous graphs for octonions.

For vectors x and y in three dimensions, the dot product satisfies

x \cdot y = ||x|| \, ||y|| \cos \theta

where θ is the angle between the two vectors. In higher dimensions, we can turn this theorem around and use it as the definition of the angle between two vectors based on their dot product.

(The plots below are jagged because they’re based on random sampling.)

Here’s the Euclidean distance between xy and yx for quaternions from the earlier post:


And here’s the corresponding angular distance:

The range has changed from [0, 2] to [o, π], and the distribution now shifts left instead of right.

Here’s a similar graph, looking at the angular distance between xy and yx for octonions, something I haven’t plotted before in Euclidean distance.

This graph is more symmetric, which we might expect: since octonions have less algebraic structure than quaternions, we might expect the relationship between xy and yz to behave more erratically, and for the plot to look more like a normal distribution.

Finally, let’s revisit the distance between (xy)z and x(yz) for octonions. Here is the distribution of the Euclidean distance from a previous post:

histogram of octonion associator norm values

And here is the corresponding histogram based on angular distance.

These plots are based on uniform random samples of quaternions and octonions of length 1, i.e. points from the unit spheres in 4 and 8 dimensions respectively. Quaternions and octonions have the property that the product of unit length vectors is another unit length vector, and the angle between to unit vectors is the inverse cosine of their dot product.

I thought that sedenions also had this norm property, that the product of unit length vectors has unit length. Apparently not, as I discovered by trying to take the inverse cosine of a number larger than 1. So what is the distribution of lengths that come from multiplying two sedenions of length 1? Apparently the mean is near 1, and here’s a histogram.

Refactored code for division algebras

An earlier post included code for multiplying quaternions, octonions, and sedenions. The code was a little clunky, so I refactor it here.

    def conj(x):
        xstar = -x
        xstar[0] *= -1
        return xstar 

    def CayleyDickson(x, y):
        n = len(x)

        if n == 1:
            return x*y

        m = n // 2
        a, b = x[:m], x[m:]
        c, d = y[:m], y[m:]
        z = np.zeros(n)
        z[:m] = CayleyDickson(a, c) - CayleyDickson(conj(d), b)
        z[m:] = CayleyDickson(d, a) + CayleyDickson(b, conj(c))
        return z

The CayleyDickson function implements the Cayley-Dickson construction and can be used to multiply real, complex, quaternion, and octonion numbers. In fact, it can be used to implement multiplication in any real vector space of dimension 2n. The numeric types listed above correspond to n = 0, 1, 2, and 3. These are the only normed division algebras over the reals.

When n = 4 we get the sedenions, which are not a division algebra because they contain zero divisors, and the code can be used for any larger value of n as well. As noted before, the algebraic properties degrade as n increases, though I don’t think they get any worse after n = 4.

If you wanted to make the code more robust, you could add code to verify that the arguments x and y have the same length, and that their common length is a power of 2. (You could just check that the length is either 1 or even; if it’s not a power of 2 the recursion will eventually produce an odd argument.)

Related posts

How close is octonion multiplication to being associative?

Quaternion multiplication is associative but not commutative. An earlier post looked at the average size of the commutator xy – yx as a measure of how far quaternion multiplication is from being commutative.

This post looks at an analogous question for octonions. Octonion multiplication is neither commutative nor associative. So in this post we look at the associator of three octonions (xy)z – x(yz) as a measure of how far octonion multiplication is from being associative.

(A post from yesterday looked at how close octonion multiplication comes to being associative in an algebraic sense, looking at weak forms of associativity. This post looks at how close multiplication comes to being associative in an analytical sense, looking at norm distances rather than algebraic identities.)

We will use a simulation to look at the average norm of the octonion associator over the octionions of unit length, analogous to the earlier post that looked at the commutator of the quaternions. We developed code for octonion multiplication in the previous post and will reuse that code here. We also developed code for generating random unit-length octonions in the same post.

Here’s code to find the average norm of the associator and plot a histogram of its values.

    import matplotlib.pyplot as plt

    # omult is octonion multiplication. 
    # See previous post.
    def associator(x, y, z):
        return omult(omult(x, y), z) - 
               omult(x, omult(y, z))

    N = 100000
    s = 0
    h = np.zeros(N)
    for n in range(N):
        x = random_unit_octonion()
        y = random_unit_octonion()
        z = random_unit_octonion()    
        t = norm(associator(x, y, z))
        s += t
        h[n] = t

    plt.hist(h, bins=50)

This gives an average of 1.095 and the histogram below.

histogram of octonion associator norm values

Update: Greg Egan calculated the exact mean value for the norm of the associator to be 147456/(42875 π) ≈ 1.0947336. Here are the details.

How far is xy from yx on average for quaternions?

Given two quaternions x and y, the product xy might equal the product yx, but in general the two results are different.

How different are xy and yx on average? That is, if you selected quaternions x and y at random, how big would you expect the difference xy – yx to be? Since this difference would increase proportionately if you increased the length of x or y, we can just consider quaternions of norm 1. In other words, we’re looking at the size of xy – yx relative to the size of xy.

Here’s simulation code to explore our question.

    import numpy as np
    def random_unit_quaternion():
        x = np.random.normal(size=4)
        return x / np.linalg.norm(x)
    def mult(x, y):
        return np.array([
            x[0]*y[0] - x[1]*y[1] - x[2]*y[2] - x[3]*y[3],
            x[0]*y[1] + x[1]*y[0] + x[2]*y[3] - x[3]*y[2],
            x[0]*y[2] - x[1]*y[3] + x[2]*y[0] + x[3]*y[1],
            x[0]*y[3] + x[1]*y[2] - x[2]*y[1] + x[3]*y[0]
    N = 10000
    s = 0
    for _ in range(N):
        x = random_unit_quaternion()
        y = random_unit_quaternion()
        s += np.linalg.norm(mult(x, y) - mult(y, x))

In this code x and y have unit length, and so xy and yx also have unit length. Geometrically, x, y, xy, and yx are points on the unit sphere in four dimensions.

When I ran the simulation above, I got a result of 1.13, meaning that on average xy and yx are further from each other than they are from the origin.

To see more than the average, here’s a histogram of ||xyyx|| with N above increased to 100,000.


I imagine you could work out the distribution exactly, though it was quicker and easier to write a simulation. We know the distribution lives on the interval [0, 2] because xy and yx are points on the unit sphere. Looks like the distribution is skewed toward its maximum value, and so xy and yz are more likely to be nearly antipodal than nearly equal.

Update: Greg Egan worked out the exact mean and distribution.