Computing logarithms of complex numbers

The previous post showed how to compute logarithms using tables. It gives an example of calculating a logarithm to 15 figures precision using tables that only allow 4 figures of precision for inputs.

Not only can you bootstrap tables to calculate logarithms of real numbers not given in the tables, you can also bootstrap a table of logarithms and a table of arctangents to calculate logarithms of complex numbers.

One of the examples in Abramowitz and Stegun (Example 7, page 90) is to compute log(2 + 3i). How could you do that with tables? Or with a programming language that doesn’t support complex numbers?

What does this even mean?

Now we have to be a little careful about what we mean by the logarithm of a complex number.

In the context of real numbers, the logarithm of a real number x is the real number y such that ey = x. This equation has a unique solution if x is positive and no solution otherwise.

In the context of complex numbers, a logarithm of the complex number z is any complex number w such that ew = z. This equation has no solution if z = 0, and it has infinitely many solutions otherwise: for any solution w, w + 2nπi is also a solution for all integers n.

Solution

If you write the complex number z in polar form

z = r eiθ

then

log(z) = log(r) + iθ.

The proof is immediate:

elog(r) + iθ = elog(r) eiθ = r eiθ.

So computing the logarithm of a complex number boils down to computing its magnitude r and its argument θ.

The equation defining a logarithm has a unique solution if we make a branch cut along the negative real axis and restrict θ to be in the range −π < θ ≤ π. This is called the principal branch of log, sometimes written Log. As far as I know, every programming language that supports complex logarithms uses the principal branch implicitly. For example, in Python (NumPy), log(x) computes the principal branch of the log function.

Example

Going back to the example mentioned above,

log(2 + 3i) = log( √(2² + 3²) ) + arctan(3/2) = ½ log(13) + arctan(3/2) i.

This could easily be computed by looking up the logarithm of 13 and the arc tangent of 3/2.

The exercise in A&S actually asks the reader to calculate log(±2 ± 3i). The reason for the variety of signs is to require the reader to pick the value of θ that lies in the range −π < θ ≤ π. For example,

log(−2 + 3i) =  = ½ log(13) + (π − arctan(3/2)) i.

Numerical application of mean value theorem

Suppose you’d like to evaluate the function

u(z) = \frac{e^z - 1 - z}{z^2}

for small values of z, say z = 10−8. This example comes from [1].

The Python code

    from numpy import exp
    def f(z): return (exp(z) - 1 - z)/z**2
    print(f(1e-8))

prints -0.607747099184471.

Now suppose you suspect numerical difficulties and compute your result to 50 decimal places using bc -l.

    scale = 50
    z = 10^-8
    (e(z) - 1 - z)/z^2

Now you get u(z) = .50000000166666667….

This suggests original calculation was completely wrong. What’s going on?

For small z,

ez ≈ 1 + z

and so we lose precision when directly evaluating the numerator in the definition of u. In our example, we lost all precision.

The mean value theorem from complex analysis says that the value of an analytic function at a point equals the continuous average of the values over a circle centered at that point. If we approximate this average by taking the average of 16 values in a circle of radius 1 around our point, we get full accuracy. The Python code

    def g(z):
        N = 16
        ws = z + exp(2j*pi*arange(N)/N)
        return sum(f(ws))/N

returns

    0.5000000016666668 + 8.673617379884035e-19j

which departs from the result calculated with bc in the 16th decimal place.

At a high level, we’re avoiding numerical difficulties by averaging over points far from the difficult region.

 

[1] Lloyd N. Trefethen and J. A. C. Weideman. The Exponentially Convergent Trapezoid Rule. SIAM Review. Vol. 56, No. 3. pp. 385–458.

Numerical differentiation with a complex step

The most obvious way to approximate the derivative of a function numerically is to use the definition of derivative and stick in a small value of the step size h.

f′ (x) ≈ ( f(x + h) − f(x) ) / h.

How small should h be? Since the exact value of the derivative is the limit as h goes to zero, the smaller h is the better. Except that’s not true in computer arithmetic. When h is too small, f(x + h) is so close to f(x) that the subtraction loses precision.

One way to see this is to think of the extreme case of h so small that f(x + h) equals f(x) to machine precision. Then the derivative is approximated by 0, regardless of what the actual derivative is.

As h gets smaller, the approximation error decreases, but the numerical error increases, and so there is some optimal value of h.

You can do significantly better by using a symmetric approximation for the derivative:

f′ (x) ≈ ( f(x + h) − f(xh) ) / 2h.

For a given value of h, this formula has about the same numerical error but less approximation error. You still have a trade-off between approximation error and numerical error, but it’s a better trade-off.

If the function f that you want to differentiate is analytic, i.e. differentiable as a function of a complex variable, you can take the step h to be a complex number. When you do, the numerical difficulty completely goes away, and you can take h much smaller.

Suppose h is a small real number and you take

f′ (x) ≈ ( f(x + ih) − f(xih) ) / 2ih.

Now f(x + ih) and −f(xih) are approximately equal by the Schwarz reflection principle. And so rather than canceling out, the terms in the numerator add. We have

f(x + ih) − f(xih) ≈ 2 f(x + ih)

and so

f′ (x) ≈ f(x + ih) / ih.

Example

Let’s take the derivative of the gamma function Γ(x) at 1 using each of the three methods above. The exact value is −γ where γ is the Euler-Mascheroni constant. The following Python code shows the accuracy of each approach.

    from scipy.special import gamma

    def diff1(f, x, h):
        return (f(x + h) - f(x))/h

    def diff2(f, x, h):
        return (f(x + h) - f(x - h))/(2*h)

    γ = 0.5772156649015328606065

    x     = 1
    h     = 1e-7
    exact = -γ

    approx1 = diff1(gamma, x, h)
    approx2 = diff2(gamma, x, h)
    approx3 = diff2(gamma, x, h*1j)

    print(approx1 - exact)
    print(approx2 - exact)
    print(approx3 - exact)

This prints

    9.95565755390615e-08
    1.9161483510998778e-10
    (9.103828801926284e-15-0j)

In this example the symmetric finite difference approximation is about 5 times more accurate than the asymmetric approximation, but the complex step approximation is 10,000,000 times more accurate.

It’s a little awkward that the complex step approximation returns a complex number, albeit one with zero complex part. We can eliminate this problem by using the approximation

f′ (x) ≈ Im f(x + ih) / h

which will return a real number. When we implement this in Python as

    def diff3(f, x, h):
        return (f(x + h*1j) / h).imag

we see that it produces the same result as diff2 but without the zero imaginary part.

Related posts

Hypergeometric function of a large negative argument

It’s occasionally necessary to evaluate a hypergeometric function at a large negative argument. I was working on a project today that involved evaluating F(a, b; c; z) where z is a large negative number.

The hypergeometric function F(a, b; c; z) is defined by a power series in z whose coefficients are functions of a, b, and c. However, this power series has radius of convergence 1. This means you can’t use the series to evaluate F(a, b; c; z) for z < −1.

It’s important to keep in mind the difference between a function and its power series representation. The former may exist where the latter does not. A simple example is the function f(z) = 1/(1 − z). The power series for this function has radius 1, but the function is defined everywhere except at z = 1.

Although the series defining F(a, b; c; z) is confined to the unit disk, the function itself is not. It can be extended analytically beyond the unit disk, usually with a branch cut along the real axis for z ≥ 1.

It’s good to know that our function can be evaluated for large negative x, but how do we evaluate it?

Linear transformation formulas

Hypergeometric functions satisfy a huge number of identities, the simplest of which are known as the linear transformation formulas even though they are not linear transformations of z. They involve bilinear transformations z, a.k.a. fractional linear transformations, a.k.a. Möbius transformations. [1]

One such transformation is the following, found in A&S 15.3.4 [2].

F(a, b; c; z) = (1-z)^{-a} F\left(a, c-b; c; \frac{z}{z-1} \right)

If z < 1, then 0 < z/(z − 1) < 1, which is inside the radius of convergence. However, as z goes off to −∞, z/(z − 1) approaches 1, and the convergence of the power series will be slow.

A more complicated, but more efficient, formula is A&S 15.3.7, a linear transformation formula relates F at z to two other hypergeometric functions evaluated at 1/z. Now when z is large, 1/z is small, and these series will converge quickly.

\begin{align*} F(a, b; c; z) &= \frac{\Gamma(c) \Gamma(b-a)}{\Gamma(b) \Gamma(c-a)} \,(-z)^{-a\phantom{b}} F\left(a, 1-c+a; 1-b+a; \frac{1}{z}\right) \\ &+ \frac{\Gamma(c) \Gamma(a-b)}{\Gamma(a) \Gamma(c-b)} \,(-z)^{-b\phantom{a}} F\left(\,b, 1-c+b; 1-a+b; \,\frac{1}{z}\right) \\ \end{align*}

Related posts

[1] It turns out these transformations are linear, but not as functions of a complex argument. They’re linear as transformations on a projective space. More on that here.

[2] A&S refers to the venerable Handbook of Mathematical Functions by Abramowitz and Stegun.

When zeros at natural numbers implies zero everywhere

Suppose a function f(z) equals 0 at z = 0, 1, 2, 3, …. Under what circumstances might you be able to conclude that f is zero everywhere?

Clearly you need some hypothesis on f. For example, the function sin(πz) is zero at every integer but certainly not constantly zero.

Carlson’s theorem says that if f is analytic and bounded for z with non-negative real part, and equals zero at non-negative integers, then f is constantly zero.

Carlson’s theorem doesn’t apply to sin(πz) because this function is not bounded in the complex plane. It is bounded on the real axis, but that’s not enough. The identity

sin(z) = ( exp(iz) – exp(-iz) ) / 2i

shows that the sine function grows exponentially in the vertical direction.

Liouville’s theorem says that if a function is analytic and bounded everywhere then it must be constant. Carleson’s theorem does not require that the function f be bounded everywhere but in the right half-plane.

In fact, the boundedness requirement can be weakened to requiring f(z) be O( exp(k|z|) ) for some k < π. This, in combination with having zeros at 0, 1, 2, 3, …. is enough to conclude that f is zero.

Related posts

Conformal map between disk and equilateral triangle

The Dixon elliptic functions sm and cm are in some ways analogous to sine and cosine. However, whereas sine and cosine satisfy

\sin^2(z) + \cos^2(z) = 1

the Dixon functions satisfy

\text{sm}^3(z) + \text{cm}^3(z) = 1

The exponent 3 foreshadows the fact that these functions have a sort of three-fold symmetry. In particular, the function sm maps an equilateral triangle in the complex plane to the unit circle. The function sm gives a conformal map from the interior of this circle to the interior of the unit disk.

In this post we will work with sm−1 rather than sm, mapping the unit circle to an equilateral triangle. An advantage of working with the inverse function is that we can start with the unit circle and see what triangle it maps to; if we started with the triangle it might seem arbitrary. Also, the function sm is not commonly part of mathematical software libraries—it’s not in Mathematica or SciPy—but you can compute its inverse via

\text{sm}^{-1}(z) = {}_2F_1(\tfrac{1}{3}, \tfrac{2}{3}; \tfrac{4}{3}; z^3) \, z

using the hypergeometric function 2F1, which is a common part of mathematical libraries.

The following image shows concentric circles in the z plane and their image under sm−1 in the w plane, w = sm−1(z).

Conformal map of unit disk to equilateral triangle using the inverse of the Dixon elliptic function sm

If we were to use this in applications, we’d need to know the vertices of the image triangle so we could do a change of variables to transform this triangle into a particular triangle we’re interested in.

The centroid of the image is at the origin, and the right-most vertex is at approximately 1.7666. To be exact, the vertex is at

v = ⅓ B(⅓, ⅓)

where B is the beta function. (Notice all the 3’s in the formula for v.) The other two vertices are at exp(2π/3)v and exp(4πi/3) v.

One way this conformal map could arise in practice is solving Laplace’s equation on a triangle. You can solve Laplace’s equation on a disk in closed form, and transform that solution into a solution on the triangle.

Related posts

Rectangles to Rectangles

There is a conformal map between any two simply connected open proper subsets of the complex plane. This means, for example, there is a one-to-one analytic map from the interior of a square onto the interior of a a circle. Or from the interior of a triangle onto the interior of a pentagon. Or from the Mickey Mouse logo to the Batman logo (see here).

So we can map (the interior of) a rectangle conformally onto a very different shape. Can we map a rectangle onto a rectangle? Yes, clearly we can do this with a linear polynomial, f(z) = az + b. Are there any other possibilities? Surprisingly, the answer is no: if an analytic function takes any rectangle to another rectangle, that analytic function must be a linear polynomial.

Since a linear polynomial is the composition of a scaling, a rotation, and a translation, this says that if a conformal map takes a rectangle to a rectangle, it must take it to a similar rectangle.

These statements are proved in [1]. Furthermore, the authors prove that “An analytic function mapping some closed convex n-gon R onto another closed convex n-gon S is a linear polynomial.”

More posts on conformal mapping

[1] Joseph Bak and Pisheng Ding. Shape Distortion by Analytic Functions. The American Mathematical Monthly. Feb. 2009, Vol. 116, No. 2.

Bounding complex roots by a positive root

Suppose you have an nth degree polynomial with complex coefficients

p(z) = anzn + an-1zn−1 + … + a0

and you want to find some circle that is guaranteed to contain all the zeros of p.

Cauchy found such a circle in 1829. The zeros of p lie inside the circle |z| ≤ r where r is the unique positive root of

f(z) = |an|zn − |an-1|zn−1 − … − |a0|

This value of r is known as the Cauchy radius of the polynomial p.

This may not seem like much of an improvement: you started with wanting to find the roots of an nth degree polynomial and you end with finding the roots of an nth degree polynomial. But Cauchy’s theorem reduces the problem of finding all roots of a complex polynomial to finding one root of a real polynomial. Furthermore, the positive root we’re after is guaranteed to be unique.

If a0 = 0 then p(z) has a factor of z and so we can reduce the problem to bounding the zeros of p(z)/z. Otherwise, f(0) < 0. Eventually f(z) must be positive because the zn term will overtake the rest of the terms for large enough z. So we only need to find some value of z where f(z) > 0 and then we could use the bisection method to find r.

Since our goal is to bound the zeros of p, we don’t need to find r exactly: an upper bound on r will do, though the smaller the upper bound the better. The bisection method gives us a sequence of upper bounds, so we could work in rational arithmetic and have rigorously provable upper bounds.

As for how to find a real value of z where f is positive, we could try z = 2k for successive value of k until we find one that works.

For example, let’s bound the roots of

p(z) = 12z5 + 2z2 + 23i = 0.

Cauchy’s theorem says we need to find the unique positive root of

f(z) = 12z5 − 2z2 − 23.

Now f(0) = −23 and f(2) = 353. So we know right away that the roots of p have absolute value less than 2.

Next we evaluate f(1), which turns out to be −13, and so the Cauchy radius is larger than 1. This doesn’t necessarily mean that p has a root with absolute value greater than 1, only that the Cauchy radius is greater than 1. An upper bound on the Cauchy radius is an upper bound on the absolute values of the roots of p; a lower bound on the Cauchy radius is not necessarily a lower bound on the largest root.

Carrying out two steps of the bisection method by hand was easy, but let’s automate the process of carrying it out further.

>>> from scipy.optimize import bisect
>>> bisect(lambda x: 12*x**5 - 2*x*x - 23, 1, 2)
1.1646451258329762

So Python tells us r = 1.1646451258329762.

Here’s a plot of the roots and the Cauchy radius.

In this example the roots of p are located very near a circle with the Cauchy radius. The roots range in absolute value between 1.1145600699993699 and 1.1634197192917954. The roots nearly lie in a circle because the quadratic term in our polynomial is small and so we are approximately finding the fifth roots of −23i.

Let’s do another example with randomly generated coefficients to get a better idea of how Cauchy’s theorem works in general. The coefficients of our polynomial, from 0th to 5th, are

0.126892 + 0.689356i,  -0.142366 + 0.260969, – 0.918873 + 0.489906i,  0.0599824 – 0.679312i,  – 0.222055 + 0.273651, + 0.154408 + 0.733325i

The roots have absolute value between 0.7844606228243709 and 1.2336256274024142, and the Cauchy radius is 1.5088421845957782. Here’s a plot.

Related posts

Schwarz lemma, Schwarz-Pick theorem, and Poincare metric

Let D be the open unit disk in the complex plane. The Schwarz lemma says that if f is an analytic function from D to D with f(0) = 0, then

|f(z)| \leq |z|

for all z in D. The lemma also says more, but this post will focus on just this portion of the theorem.

The Schwarz-Pick theorem generalizes the Schwarz lemma by not requiring the origin to be fixed. That is, it says that if f is an analytic function from D to D then

\left| \frac{f(z) - f(w)}{1 - f(z)\,\overline{f(w)}} \right| \leq \left| \frac{z - w}{1 - z\,\overline{w}}\right|

The Schwarz-Pick theorem also concludes more, but again we’re focusing on part of the theorem here. Note that if f(0) = 0 then the Schwarz-Pick theorem reduces to the Schwarz lemma.

The Schwarz lemma is a sort of contraction theorem. Assuming f(0) = 0, the lemma says

|f(z) - f(0)| \leq |z - 0|

This says applying f to a point cannot move the point further from 0. That’s interesting, but it would be more interesting if we could say f is a contraction in general, not just with respect to 0. That is indeed what the Schwarz-Pick theorem does, though with respect to a new metric.

For any two points z and w in the open unit disk D, define the Poincaré distance between z and w by

d(z,w) = \tanh^{-1}\left( \left| \frac{z - w}{1 - z\overline{w}}\right| \right)

It’s not obvious that this is a metric, but it really is. As is often the case, most of the properties of a metric are simple to confirm, but the proving the triangle inequality is the hard part.

If we apply the monotone function tanh−1 to both sides of the Schwarz-Pick theorem, then we have that any analytic function f from D to D is a contraction on D with respect to the Poincaré metric.

Here we’re using “contraction” in the lose sense. It would be more explicit to say that f is a non-expansive map. Applying f to a pair of points may not bring the points closer together, but it cannot move them any further apart (with respect to the Poincaré metric).

By using the Poincaré metric, we turn the unit disk into a hyperbolic space. That is D with the metric d is a model of the hyperbolic plane.

Related posts

When a function cannot be extended

The relation between a function and its power series is subtle. In a calculus class you’ll see equations of the form “series = function” which may need some footnotes. Maybe the series only represents the function over part of its domain: the function extends further than the power series representation.

Starting with the power series, we can ask whether the function it represents extends further than the series representation.

This video does a nice job of explaining why a particular function cannot be extended beyond the disk on which the series converges.

Toward the end, the video explains how its main example is a member of a broader class of functions that have no analytic continuation. The technical term, which the video does not use, is lacunary series [1]. When the gaps in a power series grow faster than linearly, the series cannot be extended beyond its radius of convergence.

Lacunary series make interesting images since the behavior of the function becomes complicated toward the edge of the domain. The video gives some nice examples. The image above comes from this post and the following image comes from this post.

Differential equations

The video mentions Hadamard’s gap theorem. I believe his gap theorem was a spin-off of his work on Laplace’s equation. See this post on Hadamard’s counterexample to the Dirichlet principle for the Laplacian.

The motivation for a LOT of classical math was differential equations. I didn’t realize this as a student. Years later I’d run into something and think “So that is why this person was interested in that problem,” such as why Hadamard would care about whether power series could be extended.

Hadamard wanted to solve a differential equation on a disk with boundary conditions specified on the rim. It’s going to be a problem if the series representation of the solution doesn’t extend to the rim.

Related posts

[1] Lacuna is the Latin word for a hole or a pit. The word came to be use metaphorically for a gap, such as a gap in a manuscript. Later mathematicians used this term for power series with increasing gaps between non-zero terms.