Computing sine and cosine of complex arguments with only real functions

Suppose you have a calculator or math library that only handles real arguments but you need to evaluate sin(3 + 4i). What do you do?

If you’re using Python, for example, and you don’t have NumPy installed, you can use the built-in math library, but it will not accept complex inputs.

>>> import math
>>> math.sin(3 + 4j)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: must be real number, not complex

You can use the following identities to calculate sine and cosine for complex arguments using only real functions.

\begin{align*} \sin(x + iy) &= \sin x \cosh y + i \cos x \sinh y \\ \cos(x + iy) &= \cos x \cosh y - i \sin x \sinh y \\ \end{align*}

The proof is very simple: just use the addition formulas for sine and cosine, and the following identities.

\begin{align*} \sin iz &= i \sinh z \\ \cos iz &= \cosh z \end{align*}

The following code implements sine and cosine for complex arguments using only the built-in Python functions that accept real arguments. It then tests these against the NumPy versions that accept complex arguments.

from math import *
import numpy as np

def complex_sin(z):
    x, y = z.real, z.imag
    return sin(x)*cosh(y) + 1j*cos(x)*sinh(y)

def complex_cos(z):
    x, y = z.real, z.imag
    return cos(x)*cosh(y) - 1j*sin(x)*sinh(y)

z = 3 + 4j
mysin = complex_sin(z)
mycos = complex_cos(z)
npsin = np.sin(z)
npcos = np.cos(z)
assert(abs(mysin - npsin) < 1e-14)
assert(abs(mycos - npcos) < 1e-14)

Related posts

A lesser-known characterization of the gamma function

The gamma function Γ(z) extends the factorial function from integers to complex numbers. (Technically, Γ(z + 1) extends factorial.) There are other ways to extend the factorial function, so what makes the gamma function the right choice?

The most common answer is the Bohr-Mollerup theorem. This theorem says that if f: (0, ∞) → (0, ∞) satisfies

  1. f(x + 1) = x f(x)
  2. f(1) = 1
  3. log f is convex

then f(x) = Γ(x). The theorem applies on the positive real axis, and there is a unique holomorphic continuation of this function to the complex plane.

But the Bohr-Mollerup theorem is not the only theorem characterizing the gamma function. Another theorem was by Helmut Wielandt. His theorem says that if f is holomorphic in the right half-plane and

  1. f(z + 1) = z f(z)
  2. f(1) = 1
  3. f(z) is bounded for {z: 1 ≤ Re z ≤ 2}

then f(x) = Γ(x). In short, Wielandt replaces the log-convexity for positive reals with the requirement that f is bounded in a strip of the complex plane.

You might wonder what is the bound alluded to in Wielandt’s theorem. You can show from the integral definition of Γ(z) that

|Γ(z)| ≤ |Γ(Re z)|

for z in the right half-plane. So the bound on the complex strip {z: 1 ≤ Re z ≤ 2} equals the bound on the real interval [1, 2], which is 1.

Inverse cosine

In the previous two posts, we looked at why Mathematica and SymPy did not simplify sinh(arccosh(x)) to √(x² − 1) as one might expect. After understanding why sinh(arccosh(x)) doesn’t simplify nicely, it’s natural to ask why sin(arccos(x)) does simplify nicely.

In this post I sketched a proof of several identities including

sin(arccos(x)) = √(1 − x²)

saying the identities could be proved geometrically. Let x be between 0 and 1. Construct right triangle with hypotenuse of length 1 and one side of length x. Call the acute angle formed by these two sides θ. Then cos θ = x, and so arccos(x) = θ, and sin θ = √(1 − x²). This proves the identity above, but only for 0 < x < 1.

If we make branch cuts along (−∞, −1] and [1, ∞) we can extend arccos(z) uniquely by analytic continuation. We can extend the definition to the branch cuts by continuity, but from one direction. We either have to choose the extension to be continuous from above the branch cuts or from below; we have to choose one or the other because the two limits are not equal. As far as I know, everyone chooses continuity from above, i.e. continuity with quadrant II, by convention.

In any case, we can define arccos(z) for any complex number z, and the result is a number whose cosine is z. Therefore the square of its cosine is z², and the square of its sine is 1 − z². So we have

sin²(arccos(z)) = 1 − z².

But does that mean

sin(arccos(z)) = √(1 − z²)?

Certainly it does if we’re working with real numbers, but does it for complex numbers?

Recall what square root means for complex numbers: it is the analytic function with branch cut (−∞, 0] that agrees with the real square root function along the real line, and is defined along the branch cut to be continuous with quadrant II. Since

sin(arccos(z)) = √(1 − z²)

for real −1 < z < 1, the two sides of the equation are equal on a set with a limit point, and so as analytical functions they are equal on their common domain.

The only remaining detail is whether the two functions are equal along the branch cut (−∞, −1] where we’ve extended the function by continuity. As

Since we’ve defined both the arccos and square root functions by continuous extension from the second quadrant, equality on the branch cut also follows by continuity.

llms

sinh( arccosh(x) )

I’ve written several posts about applying trig functions to inverse trig functions. I intended to write two posts, one about the three basic trig functions and one about their hyperbolic counterparts. But there’s more to explore here than I thought at first. For example, the mistakes that I made in the first post lead to a couple more posts discussing error detection and proofs.

I was curious about how Mathematica would handle these identities. Sometimes it doesn’t simplify expressions the way you expect, and for interesting reasons. It handled the circular functions as you might expect.

\renewcommand{\arraystretch}{2.2} \begin{array}{c|c|c|c} & \sin^{-1} & \cos^{-1} & \tan^{-1} \\ \hline \sin & x & \sqrt{1-x^{2}} & \dfrac{x}{\sqrt{1+x^2}} \\ \hline \cos & \sqrt{1-x^{2}} & x & \dfrac{1}{\sqrt{1 + x^2}} \\ \hline \tan & \dfrac{x}{\sqrt{1-x^{2}}} & \dfrac{\sqrt{1-x^{2}}}{x} & x \\ \end{array}

So, for example, if you enter Sin[ArcCos[x]] it returns √(1 − x²) as in the table above. Then I added an h on the end of all the function names to see whether it would reproduce the table of hyperbolic compositions.

\renewcommand{\arraystretch}{2.2} \begin{array}{c|c|c|c} & \sinh^{-1} & \cosh^{-1} & \tanh^{-1} \\ \hline \sinh & x & \sqrt{x^{2}-1} & \dfrac{x}{\sqrt{1-x^2}} \\ \hline \cosh & \sqrt{x^{2} + 1} & x & \dfrac{1}{\sqrt{1 - x^2}} \\ \hline \tanh & \dfrac{x}{\sqrt{x^{2}+1}} & \dfrac{\sqrt{x^{2}-1}}{x} & x \\ \end{array}

For the most part it did, but not entirely. The results were as expected except when applying sinh or cosh to arccosh. But Sinh[ArcCosh[x]] returns

\sqrt{\frac{x-1}{x+1}} (x+1)

and Tanh[ArcCosh[x]] returns

\frac{\sqrt{\frac{x-1}{x+1}} (x+1)}{x}

Why doesn’t Mathematica simplify as expected?

Why didn’t Sinh[ ArcCosh[x] ] just return √(x² − 1)? The expression it returned is equivalent to this: just square the (x + 1) term, bring it inside the radical, and simplify. That line of reasoning is correct for some values of x but not for others. For example, Sinh[ArcCosh[2]] returns −√3 but √(2² − 1) = √3. The expression Mathematica returns for Sinh[ArcCosh[x]] correctly evaluates to −√3.

Defining ArcCosh

To understand what’s going on, we have to look closer at what arccosh(x) means. You might say it is a function that returns the number whose hyperbolic cosine equals x. But cosh is an even function: cosh(−x) = cosh(x), so we can’t say the value. OK, so we define arccosh(x) to be the positive number whose hyperbolic cosine equals x. That works for real values of x that are at least 1. But what do we mean by, for example, arccosh(1/2)? There is no real number y such that cosh(y) = 1/2.

To rigorously define inverse hyperbolic cosine, we need to make a branch cut. We cannot define arccosh as an analytic function over the entire complex plane. But if we remove (−∞, 1], we can. We define arccosh(x) for real x > 1 to be the positive real number y such that cosh(y) = x, and define it for the rest of the complex plane (with our branch cut (−∞, 1] removed) by analytic continuation.

If we look up ArcCosh in Mathematica’s documentation, it says “ArcCosh[z] has a branch cut discontinuity in the complex z plane running from −∞ to +1.” But what about values of x that lie on the branch cut? For example, we looked at ArcCosh[-2] above. We can extend arccosh to the entire complex plane, but we cannot extend it as an analytic function.

So how do we define arccosh(x) for x in (−∞, 1]? We could define it to be the limit of arccosh(z) as z approaches x for values of z not on the branch cut. But we have to make a choice: do we approach x from above or from below? That is, we can define arccosh(x) for real x ≤ 1 by

\text{arccosh}(x) = \lim_{\varepsilon \to 0^+} \text{arccosh}(x + \varepsilon i)

or by

\text{arccosh}(x) = \lim_{\varepsilon \to 0^-} \text{arccosh}(x + \varepsilon i)

but we have to make a choice because the two limits are not the same. For example, ArcCosh[-2 + 0.001 I] returns 1.31696 + 3.14102 I but ArcCosh[-2 - 0.001 I] returns 1.31696 - 3.14102 I. By convention, we choose the limit from above.

Defining square root

Where did we go wrong when we assumed Mathematica’s expression for sinh(arccosh(x))

\sqrt{\frac{x-1}{x+1}} (x+1)

could be simplified to √(x² − 1)? We implicitly assumed √(x + 1)² = (x + 1). And that’s true, if x ≥ − 1, but not for smaller x. Just as we have be careful about how we define arccosh, we have to be careful about how we define square root.

The process of defining the square root function for all complex numbers is analogous to the process of defining arccosh. First, we define square root to be what we expect for positive real numbers. Then we make a branch cut, in this case (−∞, 0]. Then we define it by analytic continuation for all values not on the cut. Then finally, we define it along the cut by continuity, taking the limit from above.

Once we’ve defined arccosh and square root carefully, we can see that the expressions Mathematica returns for sinh(arccosh(x)) and tanh(arccosh(x)) are correct for all complex inputs, while the simpler expressions in the table above implicitly assume we’re working with values of x for which arccosh(x) is real.

Making assumptions explicit

If we are only concerned with values of x ≥ − 1 we can tell Mathematica this, and it will simplify expressions accordingly. If we ask it for

    Simplify[Sinh[ArcCosh[x]], Assumptions -> {x >= -1}]

it will return √(x² − 1).

Related posts

Hyperbolic metric

One common model of the hyperbolic plane is the Poincaré upper half plane ℍ. This is the set of points in the complex plane with positive imaginary part. Straight lines are either vertical, a set of points with constant imaginary part, or arcs of circles centered on the real axis. The real axis is not part of ℍ. From the perspective of hyperbolic geometry these are ideal parts, infinitely far away, and not part of the plane itself.

We can define a metric on ℍ as follows. To find the distance between two points u and v, draw a line between the two points, and let a and b be the ideal points at the end of the line. By a line we mean a line as defined in the geometry of ℍ, what we would see from our Euclidean perspective as a half circle or a vertical line. Then the distance between u and v is defined as the absolute value of the log of the cross ratio (u, v; ab).

d(u, v) = |\log (u, v; a, b) | = \left| \log \frac{|a - u|\,|b - v|}{|a - v|\,|b - u|} \right|
Cross ratios are unchanged by Möbius transformations, and so Möbius transformations are isometries.

Another common model of hyperbolic geometry is the Poincaré disk. We can use the same metric on the Poincaré disk because the Möbius transformation

z \mapsto \frac{z - i}{z + i}

maps the upper half plane to the unit disk. This is very similar to how the Smith chart is created by mapping a grid in the right half plane to the unit disk.

Update: See the next post for four analytic expressions for the metric, direct formulas involving u and v but not the ideal points a and b.

The anti-Smith chart

As I’ve written about several times lately, the Smith chart is the image of a rectangular grid in the right half-plane under the function

f(z) = (z − 1)/(z + 1).

What would the image of a grid in the left half-plane look like?

For starters, since f maps the right half-plane to the interior of the unit circle, it must map the left-half plane to the exterior of the unit circle.

As we said before, the function f is a Möbius transformation, and so it takes generalized circles, i.e. either a circle or a line, to generalized circles. So the grid lines in the left half-plane are either mapped to lines or circles. Which is it?

The function f has a singularity at −1 and so the image of any line (or circle) through z = −1 is unbounded, i.e. a line, not a circle. Any line not passing through −1 has a bounded image, which must be a circle.

The line Re(z) = −1 in the z plane is mapped to the line Re(w) = 1 in the w plane. Otherwise a vertical line crossing the real axis at x is mapped to a circle passing through w = (x − 1)/(x + 1). The circle also passes through w = 1 because f(∞) = 1. The circle is symmetric about the real axis, and so this is enough information to uniquely determine the circle.

Note that (x − 1)/(x + 1) > 1 when x < −1 and so vertical lines with real part less than −1 are mapped to circles to the right of w = 1. When −1 < x < 0, vertical lines are mapped to circles to the left of w = 1.

The images of horizontal lines we’ve looked at before. These are all circles passing through w = 1 and tangent to the circles that are images of vertical lines. But this time instead of taking the portion of the circles inside the unit circle, we take the portion outside the unit circle.

And without further ado, we present the anti-Smith chart, the image of a grid in the left half plane.

 

Cross ratio

The cross ratio of four points ABCD is defined by

(A, B; C, D) = \frac{AC \cdot BD}{BC \cdot AD}

where XY denotes the length of the line segment from X to Y.

The idea of a cross ratio goes back at least as far as Pappus of Alexandria (c. 290 – c. 350 AD). Numerous theorems from geometry are stated in terms of the cross ratio. For example, the cross ratio of four points is unchanged under a projective transformation.

Complex numbers

The cross ratio of four (extended [1]) complex numbers is defined by

(z_1, z_2; z_3, z_4) = \frac{(z_3 - z_1)(z_4 - z_2)}{(z_3 - z_2)(z_4 - z_1)}

The absolute value of the complex cross ratio is the cross ratio of the four numbers as points in a plane.

The cross ratio is invariant under Möbius transformations, i.e. if T is any Möbius transformation, then

(T(z_1), T(z_2); T(z_3), T(z_4)) = (z_1, z_2; z_3, z_4)

This is connected to the invariance of the cross ratio in geometry: Möbius transformations are projective transformations on a complex projective line. (More on that here.)

If we fix the first three arguments but leave the last argument variable, then

T(z) = (z_1, z_2; z_3, z) = \frac{(z_3 - z_1)(z - z_2)}{(z_3 - z_2)(z - z_1)}

is the unique Möbius transformation mapping z1, z2, and z3 to ∞, 0, and 1 respectively.

The anharmonic group

Suppose (ab; cd) = λ ≠ 1. Then there are 4! = 24 permutations of the arguments and 6 corresponding cross ratios:

\lambda, \frac{1}{\lambda}, 1 - \lambda, \frac{1}{1 - \lambda}, \frac{\lambda - 1}{\lambda}, \frac{\lambda}{\lambda - 1}

Viewed as functions of λ, these six functions form a group, generated by

\begin{align*} f(\lambda) &= \frac{1}{\lambda} \\ g(\lambda) &= 1 - \lambda \end{align*}

This group is called the anharmonic group. Four numbers are said to be in harmonic relation if their cross ratio is 1, so the requirement that λ ≠ 1 says that the four numbers are anharmonic.

The six elements of the group can be written as

\begin{align*} f(\lambda) &= \frac{1}{\lambda} \\ g(\lambda) &= 1 - \lambda \\ f(f(\lambda)) &= g(g(\lambda) = z \\ f(g(\lambda)) &= \frac{1}{\lambda - 1} \\ g(f(\lambda)) &= \frac{\lambda - 1}{\lambda} \\ f(g(f(\lambda))) &= g(f(g(\lambda))) = \frac{\lambda}{\lambda - 1} \end{align*}

Hypergeometric transformations

When I was looking at the six possible cross ratios for permutations of the arguments, I thought about where I’d seen them before: the linear transformation formulas for hypergeometric functions. These are, for example, equations 15.3.3 through 15.3.9 in A&S. They relate the hypergeometric function F(abcz) to similar functions where the argument z is replaced with one of the elements of the anharmonic group.

I’ve written about these transformations before here. For example,

F(a, b; c; z) = (1-z)^{-a} F\left(a, c-b; c; \frac{z}{z-1} \right)

There are deep relationships between hypergeometric functions and projective geometry, so I assume there’s an elegant explanation for the similarity between the transformation formulas and the anharmonic group, though I can’t say right now what it is.

Related posts

[1] For completeness we need to include a point at infinity. If one of the z equals ∞ then the terms involving ∞ are dropped from the definition of the cross ratio.

How to make a Smith chart

The Smith chart from electrical engineering is the image of a Cartesian grid under the function

f(z) = (z − 1)/(z + 1).

More specifically, it’s the image of a grid in the right half-plane.

Smith chart

This post will derive the basic mathematical properties of this graph but will not go into the applications. Said another way, I’ll explain how to make a Smith chart, not how to use one.

We will use z to denote points in the right half-plane and w to denote the image of these points under f. We will speak of lines in the z plane and the circles they correspond to in the w plane.

Möbius transformations

Our function f is a special case of a Möbius transformation. There is a theorem that says Möbius transformation map generalized circles to generalized circles. Here a generalized circle means a circle or a line; you can think of a line as a circle with infinite radius. We’re going to get a lot of mileage out of that theorem.

Image of the imaginary axis

The function f maps the imaginary axis in the z plane to the unit circle in the w plane. We can prove this using the theorem above. The imaginary axis is a line, so it’s image is either a line or a circle. We can take three points on the imaginary axis in the z plane and see where they go.

When we pick z equal to 0, i, and −i from the imaginary axis we get w values of −1, i, and −i. These three w values do not line on a line, so the image of the imaginary axis must be a circle. Furthermore, three points uniquely determine a circle, so the image of the imaginary axis is the circle containing −1, i, and −i, i.e. the unit circle.

Image of the right half-plane

The imaginary axis is the boundary of the right half-plane. Since it is mapped to the unit circle, the right half-plane is either mapped to the interior of the unit circle or the exterior of the unit circle. The point z = 1 goes to w = 0, and so the right half-plane is mapped inside the unit circle.

Images of vertical lines

Let’s think about what happens to vertical lines in the z plane, lines with constant positive real part. The images of these lines in the w plane must be either lines or circles. And since the right-half plane gets mapped inside the unit circle, these lines must get mapped to circles.

We can say a little more. All lines contain the point ∞, and f(∞) = 1, so the image of every vertical line in the z plane is a circle in the w plane, inside the unit circle and tangent to the unit circle at w = 1. (Tossing around ∞ is a bit informal, but it’s easy to make rigorous.)

The vertical lines in the z plane

map to tangent circles in the w plane.

Images of horizontal lines

Next, let’s think about horizontal lines in the z plane, lines with constant imaginary part. The image of these lines is either a line or a circle. Which is it? The image of a line is a line if it contains ∞, otherwise it’s a circle. Now f(z) = ∞ if and only if z = −1, and so the image of the real axis is a line, but the image of every other horizontal line is a circle.

Since f(∞) = 1, the image of every horizontal line passes through 1, just as the images of all the vertical lines passes through 1.

Since horizontal lines extend past the right half-plane, the image circles extend past the unit circle. The part of the line with positive real part gets mapped inside the unit circle, and the part of the line with negative real part gets mapped outside the unit circle. In particular, the image of the positive real axis is the interval [−1, 1].

Möbius transformations are conformal maps, and so they preserve angles of intersection. Since horizontal lines are perpendicular to vertical lines, the circles that are the images of the horizontal lines meet the circles that are the images of vertical lines at right angles.

The horizontal rays in the z plane

become partial circles in the w plane.

If we were to look at horizontal lines rather than rays, i.e. if we extended the lines into the left half-plane, the images in the w plane would be full circles.

Now let’s put our images together. The grid

in the z plane becomes the following in the w plane.

Spacing

An evenly spaced grid in the z plane becomes a very unevenly spaced graph in the w plane. Things are crowded on the right hand side and sparse on the left. A usable Smith chart needs to be roughly evenly filled in, which means it has to be the image of an unevenly filled in grid in the z plane. For example, you’d need more vertical lines in the z plane with small real values than with large real values.

I address the issue of spacing in the next post.

Inverting matrices and bilinear functions

The inverse of the matrix

M = \begin{bmatrix} a & b \\ c & d \end{bmatrix}

is the matrix

M^{-1} = \frac{1}{|M|} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix} = \frac{1}{ad - bc} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}

assuming adbc ≠ 0.

Also, the inverse of the bilinear function (a.k.a. Möbius transformation)

f(z) = \frac{az + b}{cz + d}

is the function

f^{-1}(z) = \frac{dz - b}{-cz + a}

again assuming adbc ≠ 0.

The elementary takeaway is that here are two useful equations that are similar in appearance, so memorizing one makes it easy to memorize the other. We could stop there, but let’s dig a little deeper.

There is apparently an association between 2 × 2 matrices and Möbius transformations

\frac{az + b}{cz + d} \leftrightarrow \begin{bmatrix} a & b \\ c & d \end{bmatrix}

This association is so strong that we can use it to compute the inverse of a Möbius transformation by going to the associated matrix, inverting it, and going back to a Möbius transformation. In diagram form, we have the following

Now there are a few loose ends. First of all, we don’t really have a map between Möbius transformations and matrices per se; we have a map between a particular representation of a Möbius transformation and a 2 × 2 matrix. If we multiplied abc, and d in a Möbius transformation by 10, for example, we’d still have the same transformation, just a different representation, but it would go to a different matrix.

What we really have is a map between Möbius transformations and equivalence classes of invertible matrices, where two matrices are equivalent if one is a non-zero multiple of the other. If we wanted to make the diagram above more rigorous, we’d replace ℂ2×2 with PL(2, ℂ), linear transformations on the complex projective plane. In sophisticated terms, our map between Möbius transformations and matrices is an isomorphism between automorphisms of the Riemann sphere and PL(2, ℂ).

Möbius transformations act a lot like linear transformations because they are linear transformations, but on the complex projective plane, not on the complex numbers. More on that here.

Area of the unit disk after a Möbius transformation

Let f(z) = (az + b)/(cz + d) where Δ = adbc ≠ 1.

If f has no singularity inside the unit disk, i.e. if |d/c| > 1, then the image of the unit disk under f is another disk. What is the area of that disk?

The calculation is complicated, but the result turns out to be

Area = π |Δ|² / (|d|² − |c|²)².

Just as a sanity check, set c = 0 and d = 1. Then we multiply the disk by a and shift by b. The shift doesn’t change the area, and multiplying by a multiples the area by |a|², which is consistent with our result.

As another sanity check, note that the area is infinite if cd, which is correct since there would be a singularity at z = −1.

Finally, here’s a third sanity check in the form of Python code.

from numpy import linspace, pi, exp

a, b, c, d = 9, 15j, 20, 25j
theory_r = abs(a*d - b*c)/(abs(d)**2 - abs(c)**2)
print("theory r:", theory_r)

t = linspace(0, 2*pi, 10000)
z = exp(1j*t)
w = (a*z + b)/(c*z + d)
approx_r = (max(w.real) - min(w.real))/2
print("approx r:", approx_r)