Continued fractions as matrix products

A continued fraction of the form

\cfrac{a_1}{b_1 + \cfrac{a_2}{b_2 + \cfrac{a_3}{b_3 + \ddots}}}

with n terms can be written as the composition

f_1 \circ f_2 \circ f_3 \circ \cdots \circ f_n


f_i(z) = \frac{a_1}{b_i + z}

As discussed in the previous post, a Möbius transformation can be associated with a matrix. And the composition of Möbius transformations is associated with the product of corresponding matrices. So the continued fraction at the top of the post is associated with the following product of matrices.

\begin{pmatrix} 0 & a_1 \\ 1 & b_1\end{pmatrix} \begin{pmatrix} 0 & a_2 \\ 1 & b_2\end{pmatrix} \begin{pmatrix} 0 & a_3 \\ 1 & b_3\end{pmatrix} \cdots \begin{pmatrix} 0 & a_n \\ 1 & b_n\end{pmatrix}

The previous post makes precise the terms “associated with” above: Möbius transformations on the complex plane ℂ correspond to linear transformations on the projective plane P(ℂ). This allows us to include ∞ in the domain and range without resorting to hand waving.

Matrix products are easier to understand than continued fractions, and so moving to the matrix product representation makes it easier to prove theorems.

Related posts

Fractional linear and linear

A function of the form

g(z) = \frac{az + b}{cz + d}

where adbc ≠ 0 is sometimes called a fractional linear transformation or a bilinear transformation. I usually use the name Möbius transformation.

In what sense are Möbius transformations linear transformations? They’re nonlinear functions unless b = c = 0. And yet they’re analogous to linear transformations. For starters, the condition adbc ≠ 0 appears to be saying that a determinant is non-zero, i.e. that a matrix is non-singular.

The transformation g is closely associated with the matrix

\begin{pmatrix} a & b \\ c & d \end{pmatrix}

but there’s more going on than a set of analogies. The reason is that Möbius transformation are linear transformations, but not on the complex numbers ℂ.

When you’re working with Möbius transformations, you soon want to introduce ∞. Things get complicated if you don’t. Once you add ∞ theorems become much easier to state, and yet there’s a nagging feeling that you may be doing something wrong by informally introducing ∞. This feeling is justified because tossing around ∞ without being careful can lead to wrong conclusions.

So how can we rigorously deal with ∞? We could move from numbers (real or complex) to pairs of numbers, as with fractions. We replace the complex number z with the equivalence class of all pairs of complex numbers whose ratio is z. The advantage of this approach is that you get to add one special number, the equivalence class of all pairs whose second number is 0, i.e. fractions with zero in the denominator. This new number system is called P(ℂ), where “P” stands for “projective.”

Möbius transformations are projective linear transformations. They’re linear on P(ℂ), though not on ℂ.

When we multiply the matrix above by the column vector (z 1)T we get

\begin{pmatrix} a & b \\ c & d \end{pmatrix} \begin{pmatrix} z \\ 1 \end{pmatrix} = \begin{pmatrix} az + b \\ cz + d \end{pmatrix}

and since our vectors are essentially fractions, the right hand side corresponds to g(z) if the second component of the vector, cz + d, is not zero.

If cz + d = 0, that’s OK. Everything is fine while we’re working in P(ℂ), but we get an element of P(ℂ) that does not correspond to an element of ℂ, i.e. we get ∞.

We’ve added ∞ to the domain and range of our Möbius transformations without any handwaving. We’re just doing linear algebra on finite complex numbers.

There’s a little bit of fine print. In P(ℂ) we can’t allow both components of a pair to be 0, and non-zero multiples of the same vector are equivalent, so we’re not quite doing linear algebra. Strictly speaking a Möbius transformation is a projective linear transformation, not a linear transformation.

It takes a while to warm up to the idea of moving from complex numbers to equivalence classes of pairs of complex numbers. The latter seems unnecessarily complicated. And it often is unnecessary. In practice, you can work in P(ℂ) by thinking in terms of ℂ until you need to have to think about ∞. Then you go back to thinking in terms of P(ℂ). You can think of P(ℂ) as ℂ with a safety net for working rigorously with ∞.

Textbooks usually introduce higher dimensional projective spaces before speaking later, if ever, of one-dimensional projective space. (Standard notation would write P¹(ℂ) rather than P(ℂ) everywhere above.) But one-dimensional projective space is easier to understand by analogy to fractions, i.e. fractions whose denominator is allowed to be zero, provided the numerator is not also zero.

I first saw projective coordinates as an unmotivated definition. “Good morning everyone. We define Pn(ℝ) to be the set of equivalence classes of ℝn+1 where ….” There had to be some payoff for this added complexity, but we were expected to delay the gratification of knowing what that payoff was. It would have been helpful if someone had said “The extra coordinate is there to let us handle points at infinity consistently. These points are not special at all if you present them this way.” It’s possible someone did say that, but I wasn’t ready to hear it at the time.

Related posts

Geometric mean on unit circle

Warm up

The geometric mean of two numbers is the square root of their product. For example, the geometric mean of 9 and 25 is 15.

More generally, the geometric mean of a set of n numbers is the nth root of their product.

Alternatively, the geometric mean of a set of n numbers the exponential of their average logarithm.

\left(\prod_{i=1}^n x_i\right)^{1/n} = \exp\left(\frac{1}{n} \sum_{i=1}^n \log x_i\right)

The advantage of the alternative definition is that it extends to integrals. The geometric mean of a function over a set is the exponential of the average value of its logarithm. And the average of a function over a set is its integral over that set divided by the measure of the set.

Mahler measure

The Mahler measure of a polynomial is the geometric mean over the unit circle of the absolute value of the polynomial.

M(p) = \exp\left( \int_0^1 \log \left|p(e^{2\pi i \theta})\right| \, d\theta\right)

The Mahler measure equals the product of the absolute values of the leading coefficient and roots outside the unit circle. That is, if

p(z) = a \prod_{i=1}^n(z - a_i)


M(p) = |a| \prod_{i=1}^n\max(1, |a_i|)


Let p(z) = 7(z − 2)(z − 3)(z + 1/2). Based on the leading coefficient and the roots, we would expect M(p) to be 42. The following Mathematica code shows this is indeed true by returning 42.

    z = Exp[2 Pi I theta]
    Exp[Integrate[Log[7 (z - 2) (z - 3) (z + 1/2)], {theta, 0, 1}]]

Multiplication and heights

Mahler measure is multiplicative: for any two polynomials p and q, the measure of their product is the product of their measures.

M(pq) = M(p)\,M(q)

A few days ago I wrote about height functions for rational numbers. Mahler measure is a height function for polynomials, and there are theorems bounding Mahler measure by other height functions, such as the sum or maximum of the absolute values of the coefficients.

Related posts

Finding the imaginary part of an analytic function from the real part

A function f of a complex variable z = xiy can be factored into real and imaginary parts:

f(x + iy) = u(x, y) + iv(x,y)

where x and y are real numbers, and u and v are real-valued functions of two real values.

Suppose you are given u(x, y) and you want to find v(x, y). The function v is called a harmonic conjugate of u.

Finding v

If u and v are the real and imaginary parts of an analytic function, then u and v are related via the Cauchy-Riemann equations. These are first order differential equations that one could solve to find u given v or v given u. The approach I’ll present here comes from [1] and relies on algebra rather than differential equations.

The main result from [1] is

So given an expression for u (or v) we evaluate this expression at z/2 and z/2i to get an expression for f, and from there we can find an expression for v (or u).

This method is simpler in practice than in theory. In practice we’re just plugging (complex) numbers into equations. In theory we’d need to be a little more careful in describing what we’re doing, because u and v are not functions of a complex variable. Strictly speaking the right hand side above applies to the extensions of u and v to the complex plane.

Example 1

Shaw gives three exercises for the reader in [1]. The first is

We find that

\begin{align*} f(z) &= 2u\left(\frac{z}{2}, \frac{z}{2i} \right) - \overline{f(0)}\\ &= iz^3 - \beta i \end{align*}

We know that the constant term is purely imaginary because u(0, 0) = 0.


\begin{align*} f(x + iy) &= i (x + iy)^3 - \beta i \\ &= y^3 - 3x^2y + i(x^3 - 3xy^2 - \beta) \end{align*}

and so

v(x, y) = x^3 - 3xy^2 - \beta

is a harmonic conjugate for u for any real number β.

The image above is a plot of the function u on the left and its harmonic conjugate v on the right.

Example 2

Shaw’s second example is

u(x, y) = \exp(x) \sin(y)


We begin with

\begin{align*} f(z) &= 2u\left(\frac{z}{2}, \frac{z}{2i} \right) - \overline{f(0)}\\ &= 1 - i \exp(z) + \beta i \end{align*}

and so

\begin{align*} f(x + iy) &= 1 - i \exp(x + iy) + \beta i \\ &= 1 - i\exp(x)(\cos y + i \sin y) + \beta i \end{align*}

From there we find

v(x, y) = \exp(x) \cos(y) + \beta

Example 3

Shaw’s last exercise is

u(x, y) = \cosh(x) \cos(y)


\begin{align*} f(z) &= 2u\left(\frac{z}{2}, \frac{z}{2i} \right) - \overline{f(0)}\\ &= 2 \cosh^2(z/2) + \beta i \end{align*}

This leads to

\begin{align*} f(x + iy) &= 2 \cosh^2((x + iy)/2) + \beta i \\ &= 2 \cosh^2(x/2) \cos^2(y/2) - 2\sinh^2(x/2)\sin(y/2) \\ &\phantom{=} \,+ 4i \sinh(x/2) \cosh(x/2) \sin(y/2)\cos(y/2) + \beta i \end{align*}

from which we read off

v(x,y) = 4 \sinh(x/2) \cosh(x/2) \sin(y/2)\cos(y/2) + \beta

Related posts

[1] William T. Shaw. Recovering Holomorphic Functions from Their Real or Imaginary Parts without the Cauchy-Riemann Equations. SIAM Review, Dec., 2004, Vol. 46, No. 4, pp. 717–728.

Complex differential equations

Differential equations in the complex plane are different from differential equations on the real line.

Suppose you have an nth order linear differential equation as follows.

y^{(n)} + a_{n-1}y^{(n-1)} + a_{n-2}y^{(n-2)} + \cdots a_0y = 0

The real case

If the a‘s are continuous, real-valued functions on an open interval of the real line, then there are n linearly independent solutions over that interval. The coefficients do not need to be differentiable for the solutions to be differentiable.

The complex case

If the a‘s are continuous, complex-valued functions on an connected open of the real line, then there may not be n linearly independent solutions over that interval.

If the a‘s are all analytic, then there are indeed n independent solutions. But if any one of the a‘s is merely continuous and not analytic, there will be fewer than n independent solutions. There may be as many as n-1 solutions, or there may not be any solutions at all.

Source: A. K. Bose. Linear Differential Equations on the Complex Plane. The American Mathematical Monthly, Vol. 89, No. 4 (Apr., 1982), pp 244–246.

Similar triangles and complex numbers

Suppose the vertices of two triangles are given by complex numbers a, b, c and x, y, z. The two triangles are similar if

\left| \begin{matrix} 1 & 1 & 1 \\ a & b & c \\ x & y & z \\ \end{matrix} \right| = 0

This can be found in G. H. Hardy’s classic A Course of Pure Mathematics. It’s on page 93 in the 10th edition.


The theorem above generalizes a result from an earlier post giving a test for whether points determine an equilateral triangle. A triangle ABC is equilateral if it is similar to BCA. For triangles to be similar, corresponding vertex angles must be equal. If you can rotate the order of the angles in one triangle and the two triangles remain similar, the angles must all be equal.


Hardy’s proof of the equation above is direct and terse. I’ll quote it here in its entirety.

The condition required is that

\overline{AB}\,/\,\overline{AC} = \overline{XY}\,/\,\overline{XZ}

(large letters denoting the points whose arguments are the corresponding small letters), or

(b-a)(c-a) = (y-x)/(z-x)

which is the same as the condition given.


To expand on this a little bit, Hardy is saying that we need to show that if we take two corresponding vertices, A and X, the ratio of the lengths of corresponding sides is the same in both triangles. If memory serves, this is called the “SAS postulate.”

The determinant condition implies the last equation in Hardy’s proof. This is not obvious, but it’s easy to work out with pencil and paper. The reader may be left thinking “OK, I can see it’s true, but why would anyone think of that?” A more motivated derivation would require more background than the reader of the book is presumed to have.

You might object that Hardy’s last equation is a ratio of complex numbers, not a ratio of lengths. But you can take the absolute value of both sides. And the absolute value of a fraction is the absolute value of the numerator divided by the absolute value of the denominator. And now you’re taking a ratio of side lengths. In symbols,

\frac{b-a}{c-a} = \frac{y-x}{z-x} \implies \frac{|b-a|}{|c-a|} = \frac{|y-x|}{|z-x|}

Related posts


Here’s something surprising: You can apply a symmetric function to a symmetric shape and get something out that is not symmetric.

Let f(z) be the average of z and its reciprocal:

f(z) = (z + 1/z)/2.

This function is symmetric in that it sends z and 1/z to the same value.

Now apply f to circles in the complex plane, varying the center and the radius. Here are some examples of what the image of circles are.

These shapes look a lot like a cross section of an airplane wing, airfoil. You can vary the radius of the circle to change the thickness of the airfoil and you can vary the imaginary part of the center to control the camber, how much the center of the airfoil bends [1].

The reason the symmetric function f applied to a symmetric shape can give a non-symmetric shape is that the word “symmetric” is being used in different senses. Still, it is surprising to see a complicated shape pop out of a simple function applied to a simple shape. Also, although the circle is perfectly smooth, the image of the circle has a sharp cusp. This is because the circles include or approach the singularity of f at 0.

The function f above is called the Joukowsky transformation, named after Nicolai Joukowsky (Никола́й Жуко́вский) who published its application to airfoils in 1910. The Joukowsky transformation gives a conformal map between a disk and an airfoil. This map lets engineers map problems from the difficult geometry of an airfoil to the simple geometry of a disk, and then map back to the airfoil. You could design an airfoil to have the shape of the transformation of a circle, or you could design a transformation so that the image of a circle approximates your airfoil.

The function f doesn’t map every circle to an airfoil-like shape, though of course aerospace engineers are most interested in the case where you do. Also, in the examples above the trailing edge of the airfoil comes to a cusp. This is not always the case; you can adjust the circle so that you have a more rounded trailing edge. The following plot illustrates both these possibilities.

More conformal mapping posts

[1] The thickness of the airfoil doesn’t depend on the radius R so much as the ratio R/a where the circle has center

c = aR exp(-iβ).

The ratio R/a is related to the thickness, and the parameter β is related to the camber.

Bounds on power series coefficients

Let f be an analytic function on the unit disk with f(0) = 0 and derivative f ′(0) = 1. If f is one-to-one (injective) then this puts a strict limit on the size of the series coefficients.

Let an be the nth coefficient in the power series for f centered at 0. If f is one-to-one then |an| ≤ n for all positive n. Or to put it another way, if |an| > n for any n > 0 then f must take on some value twice.

The statement above was originally known as the Bieberbach conjecture, but is now known as De Branges’ theorem. Ludwig Bieberbach came up with the conjecture in 1916 and Louis de Branges proved it in 1984. Many people between Bieberbach and De Branges made progress toward the final theorem: proving the theorem for some coefficients, proving it under additional hypotheses, proving approximate versions of the theorem, etc.

The function f(z) = z /(1 – z)² shows that the upper bound in De Branges’ theorem is tight. This function is one-to-one on the unit interval, and its nth coefficient is equal to n.

Jacobi functions with complex parameter

Jacobi functions are complex-valued functions of a complex variable z and a parameter m. Often this parameter is real, and 0 ≤ m < 1. Mathematical software libraries, like Python’s SciPy, often have this restriction. However, m could be any complex number.

The previous couple of posts spoke of the fundamental rectangle for Jacobi functions. But in general, Jacobi functions (and all other elliptic functions) have a fundamental parallelogram.

When m is real, the two periods of the Jacobi sn function are 4K(m) and 2K(1-m) i and so the function repeats horizontally and vertically. (Here K is the complete elliptic integral of the first kind.) When m is complex, the two periods are again 4K(m) and 2K(1-m) i, but now 4K(m) and 2K(1-m) are complex numbers and their ratio is not necessarily real. This means sn repeats over a parallelogram which is not necessarily a rectangle.

The rest of the post will illustrate this with plots.

First, here is a plot of sn(K(1/2)z, 1/2). The height of the graph represents the absolute value of the function and the color represents its phase. (The argument z was multiplied by K(1/2) to make the periods have integer values, making it a little easier to see the periods.)

Notice that the plot features line up with the coordinate axes, the real axis running from 0 to 8 in the image and the complex axis running from 0 to 4.

Here’s the analogous plot for sn(z, 2 + 2i).

Now the features are running on a diagonal. The pits are where the function is zero and the white ellipses are poles that have have the tops cut off to fit in a finite plot.

It will be easier to see what’s going on if we switch to flat plots. The color represents phase as before, but now magnitude is denoted by contour lines.

Here’s a plot of sn(K(1/2)z, 1/2)

and here’s a plot of sn(z, 2 + 2i).

According to theory the two periods should be

4 K(2 + 2i) = 4.59117 + 2.89266 i


2i K(-1 – 2i) = -1.44633 + 2.29559 i.

We can show that this is the case by plotting the real and imaginary parts of sn(z, 2 + 2i) as we move in these two directions.

The Mathematica code

        {Re[JacobiSN[t EllipticK[2 + 2 I], (2 + 2 I)]],
         Im[JacobiSN[t EllipticK[2 + 2 I], (2 + 2 I)]]}, 
        {t, 0, 4}]

produces this plot

and the code

        {Re[JacobiSN[t I EllipticK[-1 - 2 I], (2 + 2 I)]], 
         Im[JacobiSN[t I EllipticK[-1 - 2 I], (2 + 2 I)]]}, 
        {t, 0, 4}]

produces this plot.

Related posts

Conformal map from rectangles to half plane

As discussed in the previous post, the Jacobi elliptic function sn(z, m) is doubly periodic in the complex plane, with period 4K(m) in the horizontal direction and period 2K(1-m) in the vertical direction. Here K is the complete elliptic integral of the first kind.

The function sn(z, m) maps the rectangle

(-K(m), K(m)) × (0, K(1-m))

conformally onto to the upper half plane, i.e. points in the complex plane with non-negative imaginary part.

For example, sn(z, 0.5) takes the rectangle


The function sn has a singularity at the top middle of the rectangle, and so as horizontal lines approach the top of the rectangle, their image turns into bigger and bigger circles. filling the half plane.

The plot above was created with the following Mathematica code:

    K = EllipticK[1/2]
            Table[{Re[JacobiSN[x + I y, 0.5]], Im[JacobiSN[x + I y, 0.5]]}, 
            {x, -K, K, K/10}], {y, 0, K}], 
             Table[{Re[JacobiSN[x + I y, 0.5]], Im[JacobiSN[x + I y, 0.5]]}, 
             {y, 0, K, K/10}], {x, -K, K}], 
        PlotRange -> {{-8, 8}, {0, 8}}]

You can vary the parameter m to match the shape of the rectangle you want to map to the half plane. You cannot choose both the length and width of your rectangle with only one parameter, but you can choose the aspect ratio. By choosing m = 1/2 we got a rectangle twice as wide as it is tall.

In our example we have a rectangle 3.70815 wide and 1.85407 tall. If we wanted a different size but the same aspect ratio, we could scale the argument to sn. For example,

sn(Kz, 0.5)

would map the rectangle (-1, 1) × (0, 1) onto the open half plane where K is given in the Mathematica code above.

In general we can solve for the parameter m that gives us the desired aspect ratio, as explained in the previous post, then find k such that sn(kz, m) maps the rectangle of the size we want into the half plane.

The inverse function, mapping the half plane to the rectangle, can be found in terms of elliptic integrals. Mathematica provides a convenient function InverseJacobiSN for this.

We could map any rectangle conformally to any other rectangle by mapping the first rectangle to the upper half plane, then mapping the upper half plane to the second rectangle. (We can’t simply scale the rectangle directly because such a map will not be conformal, will not preserve angles, unless we scale horizontally and vertically by the same amount. We can find a conformal map that will change the aspect ratio of a rectangle, but it will not simply scale the rectangle linearly.)

Although we can conformally map between any two rectangles using the process described above, we may not want to do things this way; the result may not be what we expect. Conformal maps between regions are not quite unique. They are unique if you specify where one point in the domain goes and specify the argument of its derivative. You may need to play with those extra degrees of freedom to get a conformal map that behaves as you expect.

Related posts