Quaternion product as a matrix product

Pick a quaternion

p = p0 + p1i + p2j + p3k

and consider the function that acts on quaternions by multiplying them on the left by p.

If we think of q as a vector in R4 then this is a linear function of q, and so it can be represented by multiplication by a 4 × 4 matrix Mp.

It turns out

M_p = \begin{bmatrix} p_0 & {-}p_1 & {-}p_2 & {-}p_3 \\ p_1 & \phantom{-}p_0 & {-}p_3 & \phantom{-}p_2 \\ p_2 & \phantom{-}p_3 & \phantom{-}p_0 & {-}p_1 \\ p_3 & {-}p_2 & \phantom{-}p_1 & \phantom{-}p_0 \\ \end{bmatrix}

How might you remember or derive this matrix? Consider the matrix on the left below. It’s easier to see the pattern here than in Mp.

\begin{pmatrix} 1/1 & 1/i & 1/j & 1/k \\ i/1 & i/i & i/j & i/k \\ j/1 & j/i & j/j & j/k \\ k/1 & k/i & k/j & k/k \\ \end{pmatrix} = \begin{bmatrix} 1 & {-}i & {-}j & {-}k \\ i & \phantom{-}1 & {-}k & \phantom{-}j \\ j & \phantom{-}k & \phantom{-}1 & {-}i \\ k & {-}j & \phantom{-}i & \phantom{-}1 \\ \end{bmatrix}

You can derive Mp from this matrix.

Let’s look at the second row, for example. The second row of Mp, when multiplied by q as a column vector, produces the i component of the product.

How do you get an i term in the product? By multiplying the i component of p by the real component of q, or by multiplying the real component of p times the i component of p, or by multiplying the i/ j component of p by the j component of q, or by multiplying the i/k component of p by the k component of q.

The other rows follow the same pattern. To get the x component of the product, you add up the products of the x/y term of p and the y term of q. Here x and y range over

{1, i, j, k}.

To get Mp from the matrix on the right, replace 1 with the real component of p, replace i with the i component of p, etc.

As a final note, notice that the off-diagonal elements of Mp are anti-symmetric:

mij = –mji

unless i = j.

Inner product from norm

If a vector space has an inner product, it has a norm: you can define the norm of a vector to be the square root of the inner product of the vector with itself.

||v|| \equiv \langle v, v \rangle^{1/2}

You can use the defining properties of an inner product to show that

\langle v, w \rangle = \frac{1}{2}\left( || v + w ||^2 - ||v||^2 - ||w||^2 \right )

This is a form of the so-called polarization identity. It implies that you can calculate inner products if you can compute norms.

So does this mean you can define an inner product on any space that has a norm?

No, it doesn’t work that way. The polarization identity says that if you have a norm that came from an inner product then you can recover that inner product from norms.

What would go wrong if tried to use the equation above to define an inner product on a space that doesn’t have one?

Take the plane R² with the max norm, i.e.

|| (x, y) || \equiv \max(|x|, |y|)

and define a function that takes two vectors and returns the right-side of the polarization identity.

f(v, w) = \frac{1}{2}\left( || v + w ||^2 - ||v||^2 - ||w||^2 \right )

This is a well-defined function, but it’s not an inner product. An inner product is bilinear, i.e. if you multiply one of the arguments by a constant, you multiply the inner product by the same constant.

To see that f is not an inner product, let v = (1, 0) and w = (0, 1). Then f(v, w) = -1/2, but f(2v, w) is also -1/2. Multiplying the first argument by 2 did not multiply the result by 2.

When we say that R² with the max norm doesn’t have an inner product, it’s not simply that we forgot to define one. We cannot define an inner product that is consistent with the norm structure.

Email subscription switchover

I’ve used Feedburner to allow people to subscribe to this blog via email. That service is going away, and so I just moved everyone over to MailerLite. I turned off Feedburner email, so nobody should get duplicate email.

Feedburner’s RSS service is still going, for now, but most RSS subscribers use my RSS feed without going through Feedburner.

If you’d like to subscribe to my monthly newsletter or to blog post notifications by email, you can do so here.


Quaternion products with fewer real products

A couple days ago I wrote about Gauss’ trick to multiply two complex numbers using only 3 real multiplications. This post will do something similar with quaternions.

Just as the most direct approach to computing complex products requires 4 real multiplications, the most direct approach to quaternion products requires 16 real multiplications. (See the function mult here.)

We can represent a quaternion as a pair of complex numbers by observing

a + bi + cj + dk = (a + bi) + (c + di)j

Gauss’ trick can multiply two complex numbers using only three real multiplications. It would seem that we could use something analogous to multiply two quaternions using only 3 complex multiplications, then apply Gauss’ trick to do each of these complex multiplications with 3 real multiplications. The net result would be computing a quaternion product in only 9 real multiplications.

Except that doesn’t work. A direct application of Gauss’ trick doesn’t work because i and j don’t commute. Perhaps there’s some variation on Gauss’ trick that would work, but I haven’t found one.

So let’s back up and take a different approach.

You can represent the quaternions as 2 × 2 complex matrices via

a + bi + cj + dk \mapsto \begin{bmatrix}a - di & -c - bi \\ c - bi & \phantom{-}a + di\end{bmatrix}

So we could multiply two quaternions by multiplying two 2 × 2 complex matrices. That would require 8 complex multiplications [1], which could be carried out using 3 real multiplications each. This would let us “reduce” the number of multiplications from 16 to 24.

But by symmetry we could eliminate half the multiplications. This puts us at 12 real multiplications, which is less than the 16 required by the most direct approach.

To see this, let α = adi and β = c + bi. Then our matrix representation has the form

\begin{pmatrix} \alpha & -\beta \\ \bar{\beta} & \bar{\alpha} \end{pmatrix}

Then our quaternion product becomes the matrix product

\begin{pmatrix} \alpha & -\beta \\ \bar{\beta} & \bar{\alpha} \end{pmatrix} \begin{pmatrix} \gamma & -\delta \\ \bar{\delta} & \bar{\gamma} \end{pmatrix} = \begin{pmatrix} \alpha\gamma - \beta\bar{\delta} & -\alpha\delta - \beta\bar{\gamma}\\ \bar{\beta}\gamma + \bar{\alpha}\bar{\delta} & -\bar{\beta}\delta + \bar{\alpha}\bar{\gamma} \end{pmatrix}

We only need to compute the complex products in top row; we can recover the quaternion representation of the product from there. The products in the second row are negatives or conjugates of products in the first row.

Maybe there’s a Gauss-like trick to compute the top row of the matrix using only three complex products, but it can at least be done using four complex products.

Related posts

[1] You could use Strassen’s approach to multiply two 2 × 2 matrices in 7 multiplications. That would cut the number of real multiplications down to 21, but exploiting symmetry cuts the number of multiplications further.

Faster quaternion product rotations

You can use quaternions to describe rotations and quaternion products to carry out these rotations. These products have the form


where q represents a rotation, q* is its conjugate, and p is the the vector being rotated. (That’s leaving out some details that we’ll get to shortly.)

The primary advantage of using quaternions to represent rotations is that it avoids gimbal lock, a kind of coordinate singularity.

Saving multiplications

If you multiply quaternions directly from the definitions, the product takes 16 real multiplications. (See the function mult in the Python code below.) So a naive implication of the product above, with two quaternion multiplications, would require 32 real multiplications.

You can do something analogous to Gauss’s trick for complex multiplication to do each quaternion product with fewer multiplications [1, 2], but there’s an even better way. It’s possible to compute the rotation in 15 multiplications [3].


Before we can say what the faster algorithm is, we have to fill in some details we left out at the top of the post. We’re rotating a vector in 3 dimensions using quaternions that have 4 dimensions. We do that by embedding our vector in the quaternions, carrying out the product above, and then pulling the rotated vector out.

Specifically, let

v = (v1, v2, v3)

be the vector we want to rotate. We turn v into a quaternion by defining

p = (o, v1, v2, v3).

We encode our rotation in the unit quaternion

q = (q0, q1, q2, q3)

where q0 = cos(θ/2) and (q1, q2, q3) is the axis we want to rotate around and θ is the amount we rotate.

The rotated vector is the vector part of


i.e. the vector formed by chopping off the first component of the quaternion product.



t = 2 (q1, q2, q3) × (v1, v2, v3) = (t1, t2, t3)

where × is cross product. Then the rotated vector is

v‘ = (v1, v2, v3) + q0(t1, t2, t3) + (q1, q2, q3) × (t1, t2, t3).

The algorithm involves two cross products, with 6 real multiplications each, and one scalar-vector product requiring 3 real multiplications, for a total of 15 multiplications. (I’m not counting the multiplication by 2 as a multiplications because it could be done by an addition or by a bit-shift.)


Here’s some Python code to carry out the naive product and the faster algorithm.

    import numpy as np

    def mult(x, y):
        return np.array([
            x[0]*y[0] - x[1]*y[1] - x[2]*y[2] - x[3]*y[3],
            x[0]*y[1] + x[1]*y[0] + x[2]*y[3] - x[3]*y[2],
            x[0]*y[2] - x[1]*y[3] + x[2]*y[0] + x[3]*y[1],
            x[0]*y[3] + x[1]*y[2] - x[2]*y[1] + x[3]*y[0]

    def cross(x, y):
        return np.array( (x[1]*y[2] - x[2]*y[1], x[2]*y[0] - x[0]*y[2], x[0]*y[1] - x[1]*y[0]) )

    def slow_rot(q, v):
        qbar = -q
        qbar[0] = q[0]
        return mult(mult(q, v), qbar)[1:]

    def fast_rot(q, v):
        t = 2*cross(q[1:], v[1:])
        return v[1:] + q[0]*t + cross(q[1:], t)

And here’s some test code.

    def random_quaternion():
        x = np.random.normal(size=4)
        return x 

    for n in range(10):
        v = random_quaternion()
        v[0] = 0
        q = random_quaternion()
        q /= np.linalg.norm(q)
        vprime_slow = slow_rot(q, v)
        vprime_fast = fast_rot(q, v)
        print(vprime_fast - vprime_slow)

This prints 10 vectors that are essentially zero, indicating that vprime_slow and vprime_fast produced the same result.

It’s possible to compute the rotation in less time than two general quaternion products because we had three things we could exploit.

  1. The quaternions q and q* are very similar.
  2. The first component of p is zero.
  3. We don’t need the first component of qpq*, only the last three components.

Related posts

[1] I know you can multiply quaternions using 12 real multiplications, and I suspect you can get it down to 9. See this post.

[2] Note that I said “fewer multiplications” rather than “faster.” Gauss’ method eliminates a multiplication at the expense of 3 additions. Whether the re-arranged calculation is faster depends on the ratio of time it takes for a multiply and an add, which depends on hardware.

However, the rotation algorithm given here reduces multiplications and additions, and so it should be faster on most hardware.

[3] I found this algorithm here. The author cites two links, both of which are dead. I imagine the algorithm is well known in computer graphics.

Three kinds of tensor

I’ve written before about how the word “tensor” is applied to several different things, and that the connection between them isn’t immediately obvious. I thought about writing more about that some day, but I recently became aware of a paper that does this more thoroughly than I ever could.

The paper looks at three notions of a tensor

  1. a multi-indexed object that satisfies certain transformation rules
  2. a multilinear map
  3. an element of a tensor product of vector spaces

and explores (for 210 pages!) how they are related. The length comes primarily from the number of examples. The author is pulling on a thread that runs throughout a lot of mathematics.

Here’s the paper:

Lek-Heng Lim. Tensors in computations. Acta Numerica (2021), pp. 1–210. doi:10.1017/S0962492921000076. Available here.

Laplacian in various coordinate systems

The recent post on the wave equation on a disk showed that the Laplace operator has a different form in polar coordinates than it does in Cartesian coordinates. In general, the Laplacian is not simply the sum of the second derivatives with respect to each variable.

Mathematica has a function, unsurprisingly called Laplacian, that will compute the Laplacian of a given function in a given coordinate system. If you give it a specific function, it will compute the Laplacian of that function. But you can also give it a general function to find a general formula.

For example,

    Simplify[Laplacian[f[r, θ], {r, θ}, "Polar"]]


\frac{f^{(0,2)}(r,\theta )}{r^2}+\frac{f^{(1,0)}(r,\theta )}{r}+f^{(2,0)}(r,\theta )

This is not immediately recognizable as the Laplacian from this post

\Delta u = \frac{\partial^2 u}{\partial r^2} + \frac{1}{r} \frac{\partial u}{\partial r} + \frac{1}{r^2} \frac{\partial^2 u}{\partial \theta^2}

because Mathematica is using multi-index notation, which is a little cumbersome for simple cases, but much easier to use than classical notation when things get more complicated. The superscript (0,2), for example, means do not differentiate with respect to the first variable and differentiate twice with respect to the second variable. In other words, take the second derivative with respect to θ.

Here’s a more complicated example with oblate spheroidal coordinates. Such coordinates come in handy when you need to account for the fact that our planet is not exactly spherical but is more like an oblate spheroid.

When we ask Mathematica

    Simplify[Laplacian[f[ξ, η, φ], {ξ, η, φ}, "OblateSpheroidal"]]

the result isn’t pretty.

(Csc[\[Eta]]^2*Sech[\[Xi]]^2*Derivative[0, 0, 2][f][\[Xi], \[Eta], \[Phi]] + (2*(Cot[\[Eta]]*Derivative[0, 1, 0][f][\[Xi], \[Eta], \[Phi]] + Derivative[0, 2, 0][f][\[Xi], \[Eta], \[Phi]] + Tanh[\[Xi]]*Derivative[1, 0, 0][f][\[Xi], \[Eta], \[Phi]] + Derivative[2, 0, 0][f][\[Xi], \[Eta], \[Phi]]))/ (Cos[2*\[Eta]] + Cosh[2*\[Xi]]))/\[FormalA]^2

I tried using TeXForm and editing it into something readable, but after spending too much time on this I gave up and took a screenshot. But as ugly as the output is, it would be uglier (and error prone) to do by hand.

Mathematica supports the following 12 coordinate systems in addition to Cartesian coordinates:

  • Cylindrical
  • Bipolar cylindrical
  • Elliptic cylindrical
  • Parabolic cylindrical
  • Circular parabolic
  • Confocal paraboloidal
  • Spherical
  • Bispherical
  • Oblate spheroidal
  • Prolate spheroidal
  • Conical
  • Toroidal

These are all orthogonal, meaning that surfaces where one variable is held constant meet at right angles. Most curvilinear coordinate systems used in practice are orthogonal because this simplifies a lot of things.

Laplace’s equation is separable in Stäckel coordinate systems. These are all these coordinate systems except for toroidal coordinates. And in fact Stäckel coordinates are the only coordinate systems in which Laplace’s equation is separable.

It’s often the case that Laplace’s equation is separable in orthogonal coordinate systems, but not always. I don’t have a good explanation for why toroidal coordinates are an exception.

If you’d like a reference for this sort of thing, Wolfram Neutsch’s tome Coordinates is encyclopedic. However, it’s expensive new and hard to find used.

Gauss algorithm for complex multiplication

The most straight-forward way of multiplying two complex numbers requires four multiplications of real numbers:

    def mult(a, b, c, d):
        re = a*c - b*d
        im = b*c + a*d
        return (re, im)

Gauss [1] discovered a way to do this with only three multiplications:

    def gauss(a, b, c, d):
        t1 = a*c
        t2 = b*d
        re = t1 - t2
        im = (a + b)*(c + d) - t1 - t2
        return (re, im)

You can count the *, +, and - symbols above and see that the first algorithm uses 4 multiplications and 2 additions/subtractions. The second algorithm uses 3 multiplications and 5 additions/subtractions.

If you’re carrying out calculations by hand, the latter is faster since multiplication takes much longer than addition or subtraction. In a computer, the latter may not be much faster or even any faster. It depends on the hardware, and whether the real and imaginary parts are integers or floats.

Gauss’s algorithm would be faster than the direct algorithm if the components were very large integers or very high-precision floats. The time required to add n-digit integers is O(n), but the time required to multiply n-digit numbers is at least O(n log n). So for large enough n, it’s worth doing some extra addition to save a multiplication.

I imagine someone has created an algorithm for quaternion multiplication analogous to the algorithm above for complex multiplication, but you could try it yourself as an exercise. I wonder how much you could reduce the number of component multiplications relative to the most direct approach. (Update: here’s my stab at it.)

Related posts

[1] I’ve seen this described as “commonly attributed to Gauss.” Maybe there’s some debate over whether Gauss did this or whether he was the first.

Vibrating circular membranes

'Snare drum' by YannickWhee is licensed under CC BY 2.0

This post will tie together many things I’ve blogged about before. The previous post justified separation of variables. This post will illustrate separation of variables.

Also, this post will show why you might care about Bessel functions and their zeros. I’ve written about Bessel functions before, and said that Bessel functions are to polar coordinates what sines and cosines are to rectangular coordinates. This post will make this analogy more concrete.

Separation of variables

The separation of variables technique is typically presented in three contexts in introductory courses on differential equations:

  1. One-dimensional heat equation
  2. One-dimensional wave equation
  3. Two-dimensional (rectangular) Laplace equation

My initial motivation for writing this post was to illustrate separation of variables outside the most common examples. Separation of variables requires PDEs to have a special form, but not as special as the examples above might imply. A secondary motivation was to show Bessel functions in action.

Radially symmetric wave equation

Suppose you have a thin membrane, like a drum head, and you want to model its vibrations. By “thin” I mean that the membrane is sufficiently thin that we can adequately model it as a two-dimensional surface bobbing up and down in three dimensions. It’s not so thick that we need to model the material in more detail.

Let u(x, y, t) be the height of the membrane at location (x, y) and time t. The wave equation is a PDE modeling the motion of the membrane by

\frac{\partial^2 u}{\partial t^2} = c^2 \Delta u. width=

where Δ is the Laplacian operator. In rectangular coordinates the Laplacian is given by

\Delta u = \frac{\partial^2 u}{\partial x^2} + \frac{\partial^2 u}{\partial y^2}
We’re interested in a circular membrane, and so things will be much easier if we work in polar coordinates. In polar coordinates the Laplacian is given by

\Delta u = \frac{\partial^2 u}{\partial r^2} + \frac{1}{r} \frac{\partial u}{\partial r} + \frac{1}{r^2} \frac{\partial^2 u}{\partial \theta^2}

We will assume that our boundary conditions and initial conditions are radially symmetric, and so our solution will be radially symmetric, i.e. derivatives with respect to θ are zero. And in our case the wave equation simplifies to

\frac{\partial^2 u}{\partial t^2} = c^2 \left( \frac{\partial^2 u}{\partial r^2} + \frac{1}{r} \frac{\partial u}{\partial r} \right)

Boundary and initial conditions

Let a be the radius of our membrane. We will assume our membrane is clamped down on its boundary, like a drum head, which means we have the boundary condition

u(a, t) = 0

for all t ≥ 0. We assume the initial displacement and initial velocity of the membrane are given by

\begin{align*} u(r, 0) &= f(r) \\ \frac{\partial u}{\partial r}(r, 0) &= g(r) \end{align*}

for all r between 0 and a.

Separating variables

Now we get down to separation of variables. Because we’re assuming radial symmetry, we’re down to a function of two variables: r for the distance from the center and t for time. We assume

u(r, t) = R(r) T(t)

and stick it into the wave equation. A little calculation shows that

\frac{T''}{c^2 T} = \frac{1}{R} \left(R'' + \frac{1}{r} R'\right)

The left side is a function of t alone, and the right side is a function of r alone. The only way this can be true for all t and all r is for both sides to be constant. Call this constant -λ². Why? Because looking ahead a little we find that this will make things easier shortly.

Separation of variables allowed us to reduce our PDE to the following pair of ODEs.

r R'' + R' +\lambda^2 r R &= 0 \\ T'' + c^2 \lambda^2 T &= 0

The solutions to the equation for R are linear combinations of the Bessel functions J0r) and Y0r) [1].

And the solutions to the equation for T are linear combinations of the trig functions cos(cλt) and sin(cλt).

The boundary condition u(a, t) = 0 implies the boundary condition R(a) = 0. This implies that λa must be a zero of our Bessel function, and that all the Y0 terms drop out. This means that our solution is

u(r,t) = \sum_{n=1}^\infty J_0(\lambda_n r) \left( A_n \cos(c \lambda_n t) + B_n \sin(c \lambda_n t)\right)


\lambda_n = \frac{\alpha_n}{a}.

Here αn are the zeros of of the Bessel function J0.

The coefficients An and Bn are determined by the initial conditions. Specifically, you can show that

\begin{align*} A_n &= \frac{2}{\phantom{ca}a^2 J_1^2(\alpha_n)} \int_0^a f(r)\, J_0(\lambda_n r) \, r\, dr \\ B_n &= \frac{2}{ca \alpha_n J_1^2(\alpha_n)} \int_0^a g(r)\, J_0(\lambda_n r) \, r\, dr \\ \end{align*}

The function J1 in the expression for the coefficients is another Bessel function.

The functions J0 and J1 are so important in applications that even the otherwise minimalist Unix calculator bc includes these functions. (As much as I appreciate Bessel functions, this still seems strange to me.) And you can find functions for zeros of Bessel functions in many libraries, such as scipy.special.jn_zeros in Python.

Related posts

[1] This is why introductory courses are unlikely to include an example in polar coordinates. Separation of variables itself is no harder in polar coordinates than in rectangular coordinates, and it shows the versatility of the method to apply it in a different setting.

But the resulting ODEs have Bessel functions for solutions, and it’s understandable that an introductory course might not want to go down this rabbit trail, especially since PDEs are usually introduced at the end of a differential equation class when professors are rushed and students are tired.

[2] Snare drum image above “Snare drum” by YannickWhee is licensed under CC BY 2.0

Justifying separation of variables

The separation of variables technique for solving partial differential equations looks like a magic trick the first time you see it. The lecturer, or author if you’re more self-taught, makes an audacious assumption, like pulling a rabbit out of a hat, and it works.

For example, you might first see the heat equation

ut = c² uxx.

The professor asks you to assume the solution has the form

u(x, t) = X(x) T(t).

i.e. the solution can be separated into the product of a function of x alone and a function of t alone.

Following that you might see Laplace’s equation on a rectangle

uxx + uyy = 0

with the analogous assumption that

u(x, y) = X(x) Y(y),

i.e. the product of a function of x alone and a function of y alone.

There are several possible responses to this assumption.

  1. Whatever you say, doc.
  2. How can you assume that?
  3. How do you know you’re not missing any possibilities?
  4. What made someone think to try this?

As with many things, separation of variables causes the most consternation for the moderately sophisticated students. The least sophisticated students are untroubled, and the most sophisticated student can supply their own justification (at least after the fact).

One response to question (2) is “Bear with me. I’ll show that this works.”

Another response would be “OK, how about assuming the solution is a sum of such functions. That’s a much larger space to look in. And besides, we are going to take sums of such solutions in a few minutes.” One could argue from functional analysis or approximation theory that the sums of separable functions are dense in reasonable space of functions [1].

This is a solid explanation, but it’s kind of anachronistic: most students see separation of variables long before they see functional analysis or approximation theory. But it would be a satisfying response for someone who is seeing all this for the second time. Maybe they were exposed to separation of variables as an undergraduate and now they’re taking a graduate course in PDEs. In an undergraduate class a professor could do a little foreshadowing, giving the students a taste of approximation theory.

Existence of solutions is easier to prove than uniqueness in this case because you can concretely construct a solution. This goes back to the “it works” justification. This argument deserves more respect than a sophomoric student might give it. Mathematics research is not nearly as deductive and mathematics education. You often have to make inspired guesses and then show that they work.

Addressing question (3) requires saying something about uniqueness. A professor could simply assert that there are uniqueness theorems that allow you to go from “I’ve found something that works” to “and so it must be the only thing that works.” Or one could sketch a uniqueness theorem. For example, you might apply a maximum principle to show that the difference between any two solutions is zero.

Question (4) is in some sense the most interesting question. It’s not a mathematical question per se but a question about how people do mathematics. I don’t know what was going through the mind of the first person to try separation of variables, or even who this person was. But a plausible line of thinking is that ordinary differential equations are easier than partial differential equations. How might you reduce a PDE to an ODE? Well, if the solution could be factored into functions of one variable, …

The next post will illustrate using separation of variables by solving the wave equation on a disk.

Related posts

[1] Also, there’s the mind-blowing Kolmogorov-Arnol’d theorem. This theorem says any continuous function of several variables can be written as a sum of continuous separable functions. It doesn’t say you can make the functions in your sum smooth, but it suggests that sums of separable functions are more expressive than you might have imagined.