It’s not just Taylor series

There is still active discussion on X about the approximation

exp(−x²) ≈ (1 + cos(sin(x) + x))/2

and some are saying this can just be explained by Taylor series: the series for the two sides differ for the first time at the x6 term, so that’s why you get a good approximation. As I wrote yesterday, that’s only part of it.

If it were just about Taylor series you could use

exp(−x²) ≈ 1 − x² + x4/2

which also has error O(x6). But this approximation is only good for fairly small x, say in [−0.5, 0.5], whereas the approximation at the top of the post is good over [−4, 4]. When x = 4, the error in the cosine approximation is 0.002579 but the error in the Taylor approximation is 113, five orders of magnitude larger.

If the accuracy of the cosine approximation were due to Taylor series, then we’d expect the function

exp(−x²) − (1 + cos(sin(x) + x))/2

to be small not just over the interval [−4, 4] but over a disk of radius 4 in the complex plane. But it’s not. When x = 4i the function is on the order of 1013.

Both the cosine approximation and the Taylor approximation are good for small disks, say of radius 0.5. They’re both bad for much larger disks, and in fact the cosine approximation is worse.

 

Real and imaginary parts

The previous post announced some notes I wrote up based on an article by Henry Baker implementing functions of a complex variable in terms of functions of a real variable. That is, it finds functions u(x, y) and v(x, y) such that

f(x + iy) = u(x, y) + i v(x, y)

where xyu, and v are all real-valued. Not only that, but if f is an elementary function, so are u and v. Here “elementary” has a technical meaning, but essentially it means functions that you could evaluate on a scientific calculator. A couple of the functions might be unfamiliar, such as sgn and atan2, but there are no functions like the gamma function that are defined in terms of integrals.

One application of Baker’s equations would be to bootstrap a math library that doesn’t support complex numbers into one that does. But the equations could be useful in pure math when you’d like to have a convenient expression for the real or imaginary part of a function.

The real and imaginary parts of a complex analytic function are harmonic functions. So the functions on the right hand side of Baker’s equations satisfy Laplace’s equation:

uxx + uyy = 0

and

vxx + vyy = 0.

Furthermore, the functions u and v form harmonic conjugate pairs, meaning each is the Hilbert transform of the other.

Building complex functions out of real parts

A couple months ago I wrote about how to compute the sine and cosine of a complex number using only real functions of real variables using the equations

\begin{align*} \sin(x + iy) &= \sin x \cosh y + i \cos x \sinh y \\ \cos(x + iy) &= \cos x \cosh y - i \sin x \sinh y \\ \end{align*}

You can do something analogous for all the elementary functions, though some of the equations are quite a bit more complicated than the ones above. See the equations here.

The equations come from a paper by Henry G. Baker, cited in the linked page. I wrote up Baker’s equations in LaTeX, then used ChatGPT to generate Python code from the LaTeX to numerically verify the equations and my typesetting of them. This caught a few typos on my part.

The test code evaluated the equations at points from each quadrant. All matched NumPy, implying that Baker and NumPy use the same branch cuts on inverse functions.

***

This post is part of a thread that has gone on for a few days. Maybe it’s the last post in the thread; we’ll see.

It all started with a post on Markov’s equation

x² + y² + z² = 3 xyz

and an approximation to the equation that has a closed-form solution. That led to the identity

cosh( arccosh(a) + arccosh(b) ) = ab + √(a² − 1) √(b² − 1).

The approximation to Markov’s equation only needed the identity to be valid for real a and b greater than 1. But when I looked closer at the identity I found several complications with branch cuts. The identity doesn’t hold everywhere using the principle branch of the square root function. But if you define √(z² − 1) to have a branch cut along [−1, 1] then the equation holds everywhere in the complex plane. And that led to my writing up some notes on how to define all the elementary inverse functions in terms of log.

Someone reading these posts suggested I look at a paper that mentioned “couth” and “uncouth” function pairs, which led to this post and its warm up.

I find all this interesting because it’s an advanced perspective on a questions that are latent in an intro calculus class. What exactly do functions like arccos mean and why where they defined as they were? These are fairly deep and interesting questions that are swept under the rug, and swept there for good reason. A calculus class has to cover an enormous amount of material and there’s no time to dwell on fine points. Some of my favorite posts look back leisurely on things that go by in a blur when you’re a student.

Couth and uncouth function pairs

“You can’t always get what you want. But sometimes you get what you need.” — The Rolling Stones

Circular functions and hyperbolic functions aren’t invertible, but we invert them anyway. These functions map many points in the domain to each point in the range, and we invert them by mapping a point in the range back to some point in the domain. Often this works as expected, but sometimes it doesn’t.

In the previous post I said

You can relate each trig function “foo” with its hyperbolic counterpart “fooh” by applying one of the functions to iz then multiplying by a constant c that depends on foo: ci for sin and tan, c = 1 for cos and sec, and c = −i for csc and cot.

In symbols,

c foo(z) = fooh(iz).

Let’s suppose foo and fooh are invertible, ignoring any complications, and solve foo(z) = w for z. We get

i foo−1(w) = fooh−1(cw)

or using “arc” nomenclature for inverse functions

i arcfoo(w) = arcfooh(cw).

When the naive calculation above holds, except possibly at a finite number of points, we say the pair (foo, fooh) is couth. Otherwise we say the pair is uncouth. These term were coined by Robert Corless and his coauthors in their paper [1].

Whether the pair (foo, fooh) is couth depends not only on foo and fooh, but also on the details of how arcfoo and arcfooh are defined.

In Python’s NumPy library, the pairs (sin, sinh) and (tan, tanh) are couth, but the pair (cos, cosh) is uncouth.

Numpy doesn’t define the reciprocal functions sec, sech, csc, csch, cot, and coth. I used to find that annoying, but I’m beginning to think that was wise. These functions cause problems. For example, there may be two reasonable ways to define these functions, one of which forms a couth pair and one of which forms an uncouth pair.

For example, how should you define cot and coth? There would be no disagreement over the definition

cot = lambda x: 1/tan(x)

but there are at least two definitions of inverse coth that you’ll find in practice:

arccot = lambda z: 0.5*pi - arctan(z)
arccot = lambda z: arctan(1/z).

Both definitions have their advantages, but the former is uncouth while the latter is couth. You can verify that both definitions are the same at z = 1 but not at z = −1.

With the following definitions, the pairs (cos, cosh) and (sec, sech) are uncouth and the rest are couth.

from numpy import *

csc     = lambda x: 1/sin(x)
sec     = lambda x: 1/cos(x)
cot     = lambda x: 1/tan(x)
csch    = lambda x: 1/sinh(x)
sech    = lambda x: 1/cosh(x)
coth    = lambda x: 1/tanh(x)

arccot  = lambda z: arctan(1/z)
arcsec  = lambda z: arccos(1/z)
arccsc  = lambda z: arcsin(1/z)
arccoth = lambda z: arctanh(1/z)
arcsech = lambda z: arccosh(1/z)
arccsch = lambda z: arcsinh(1/z)

[1] “According to Abramowitz and Stegun” or arccoth needn’t be uncouth. Robert M. Corless et al. ACM SIGSAM Bulletin, Volume 34, Issue 2, pp 58 – 65 https://doi.org/10.1145/362001.362023

Square root of x² − 1

How should we define √(z² − 1)? Well, you could square z, subtract 1, and take the square root. What else would you do?!

The question turns out to be more subtle than it looks.

When x is a non-negative real number, √x is defined to be the non-negative real number whose square is x. When x is a complex number √x is defined to be a function that extends √x from the real line to the complex plane by analytic continuation. But we can’t extend √x as an analytic function to the entire complex plane ℂ. We have to choose to make a “cut” somewhere, and the conventional choice is to make a cut along the negative real axis.

Using the principal branch

The “principal branch” of the square root function is defined to be the unique function that analytically extends √x from the positive reals to ℂ \ (−∞, 0].

Assume for now that by √x we mean the principal branch of the square root function. Now what does √(z² − 1) mean? It could mean just what we said at the top of the post: we square z, subtract 1, and apply the (principal branch of the) square root function. If we do that, we must exclude those values of z such that (z² − 1) is negative. This means we have to cut out the imaginary axis and the interval [−1, 1].

This is what Mathematica will do when asked to evaluate Sqrt[z^2 - 1]. The command

ComplexPlot[Sqrt[z^2 - 1], {z, -2 - 2 I, 2 + 2 I}]

makes the branch cuts clear by abrupt changes in color.

A different approach

Now let’s take a different approach. Consider the function √(z² − 1) as a whole. Do not think of it procedurally as above, first squaring z etc. Instead, think of a it as a black box that takes in z and returns a complex number whose square is z² − 1.

This function has an obvious definition for z > 1. And we can extend this function, via analytic continuation, to more of the complex plane. We can do this directly, not by extending the square root function. But as before, we cannot extend the function analytically to all of ℂ. We have to cut something out. A common choice is to cut out [−1, 1]. This eliminates the need for a branch cut along the imaginary axis.

The function

f(z) = \exp\left( \tfrac{1}{2}\left( \log(z - 1) + \log(z + 1) \right) \right)

extends √(z² − 1) the way we want [1].

The Mathematica code

ComplexPlot[Exp[(1/2) (Log[z - 1 ] + Log[z + 1])], {z, -2 - 2 I, 2 + 2 I}]

shows that our function is now continuous across the imaginary axis, though there’s still a discontinuity as you cross [−1, 1].

We used this analytic extension of √(z² − 1) in the previous post to eliminate branch cuts in an identity.

Related posts

[1] The principal branch of the logarithm has a cut along the negative real axis. Why does our square root function, defined using log, not have a branch cut along the negative axis?

First of all, the log function, and Mathematica’s implementation of it Log[], isn’t undefined on (−∞, 1), it just isn’t continuous there. The function still has a value. By convention, the value is taken to be the limit of log(z) approaching z from above, i.e. from the 2nd quadrant.

Second, the value of (log(z – 1) + log(z + 1))/2 differs by a factor of 2πi when approaching a value z < −1 from above versus from below. This factor goes away when taking the exponential. So our function f(z) is analytic across (−∞, 1) even though the log functions in its definition are not.

Closer look at an identity

The previous post derived the identity

\cosh\Big( \operatorname{arccosh}(x) + \operatorname{arccosh}(y)\Big) = xy + \sqrt{x^2 -1} \sqrt{y^2 -1}

and said in a footnote that the identity holds at least for x > 1 and y > 1. That’s true, but let’s see why the footnote is necessary.

Let’s have Mathematica plot

\left| \cosh\Big( \operatorname{arccosh}(x) + \operatorname{arccosh}(y)\Big) - xy - \sqrt{x^2 -1} \sqrt{y^2 -1} \right|

The plot will be 0 where the identity above holds.

The plot is indeed flat for x > 1 and y > 1, and more, but not everywhere.

If we combine the two square roots

\left| \cosh\Big( \operatorname{arccosh}(x) + \operatorname{arccosh}(y)\Big) - xy - \sqrt{(x^2 -1) (y^2 -1)} \right|

and plot again we still get a valid identity for x > 1 and y > 1, but the plot changes.

This is because √ab does not necessarily equal √(ab) when the arguments may be negative.

The square root function and the arccosh function are not naturally single-valued functions. They require branch cuts to force them to be single-valued, and the two functions require different branch cuts. I go into this in some detail here.

There is a way to reformulate our identity so that it holds everywhere. If we replace

\sqrt{z^2 - 1}

with

f(z) = \exp\left( \tfrac{1}{2}\left( \log(z - 1) + \log(z + 1) \right) \right)

which is equivalent for z > 1, the corresponding identity holds everywhere.

We can verify this with the following Mathematica code.

f[z_] := Exp[(1/2) (Log[z - 1 ] + Log[z + 1])]
FullSimplify[Cosh[ArcCosh[x] + ArcCosh[y]] - x y - f[x] f[y]]

This returns 0.

By contrast, the code

FullSimplify[
 Cosh[ArcCosh[x] + ArcCosh[y]] - x y - Sqrt[x^2 - 1] Sqrt[y^2 - 1]]

simply returns its input with no simplification, unless we add restrictions on x and y. The code

FullSimplify[
 Cosh[ArcCosh[x] + ArcCosh[y]] - x y - Sqrt[x^2 - 1] Sqrt[y^2 - 1], 
 Assumptions -> {x > -1 && y > -1}]

does return 0.

Related posts

Computing sine and cosine of complex arguments with only real functions

Suppose you have a calculator or math library that only handles real arguments but you need to evaluate sin(3 + 4i). What do you do?

If you’re using Python, for example, and you don’t have NumPy installed, you can use the built-in math library, but it will not accept complex inputs.

>>> import math
>>> math.sin(3 + 4j)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: must be real number, not complex

You can use the following identities to calculate sine and cosine for complex arguments using only real functions.

\begin{align*} \sin(x + iy) &= \sin x \cosh y + i \cos x \sinh y \\ \cos(x + iy) &= \cos x \cosh y - i \sin x \sinh y \\ \end{align*}

The proof is very simple: just use the addition formulas for sine and cosine, and the following identities.

\begin{align*} \sin iz &= i \sinh z \\ \cos iz &= \cosh z \end{align*}

The following code implements sine and cosine for complex arguments using only the built-in Python functions that accept real arguments. It then tests these against the NumPy versions that accept complex arguments.

from math import *
import numpy as np

def complex_sin(z):
    x, y = z.real, z.imag
    return sin(x)*cosh(y) + 1j*cos(x)*sinh(y)

def complex_cos(z):
    x, y = z.real, z.imag
    return cos(x)*cosh(y) - 1j*sin(x)*sinh(y)

z = 3 + 4j
mysin = complex_sin(z)
mycos = complex_cos(z)
npsin = np.sin(z)
npcos = np.cos(z)
assert(abs(mysin - npsin) < 1e-14)
assert(abs(mycos - npcos) < 1e-14)

It is possible to implement all the elementary functions of a complex variable in terms of real-valued functions of a real variable, though some of the expressions are quite complicated. More here.

Related posts

A lesser-known characterization of the gamma function

The gamma function Γ(z) extends the factorial function from integers to complex numbers. (Technically, Γ(z + 1) extends factorial.) There are other ways to extend the factorial function, so what makes the gamma function the right choice?

The most common answer is the Bohr-Mollerup theorem. This theorem says that if f: (0, ∞) → (0, ∞) satisfies

  1. f(x + 1) = x f(x)
  2. f(1) = 1
  3. log f is convex

then f(x) = Γ(x). The theorem applies on the positive real axis, and there is a unique holomorphic continuation of this function to the complex plane.

But the Bohr-Mollerup theorem is not the only theorem characterizing the gamma function. Another theorem was by Helmut Wielandt. His theorem says that if f is holomorphic in the right half-plane and

  1. f(z + 1) = z f(z)
  2. f(1) = 1
  3. f(z) is bounded for {z: 1 ≤ Re z ≤ 2}

then f(x) = Γ(x). In short, Wielandt replaces the log-convexity for positive reals with the requirement that f is bounded in a strip of the complex plane.

You might wonder what is the bound alluded to in Wielandt’s theorem. You can show from the integral definition of Γ(z) that

|Γ(z)| ≤ |Γ(Re z)|

for z in the right half-plane. So the bound on the complex strip {z: 1 ≤ Re z ≤ 2} equals the bound on the real interval [1, 2], which is 1.

Inverse cosine

In the previous two posts, we looked at why Mathematica and SymPy did not simplify sinh(arccosh(x)) to √(x² − 1) as one might expect. After understanding why sinh(arccosh(x)) doesn’t simplify nicely, it’s natural to ask why sin(arccos(x)) does simplify nicely.

In this post I sketched a proof of several identities including

sin(arccos(x)) = √(1 − x²)

saying the identities could be proved geometrically. Let x be between 0 and 1. Construct right triangle with hypotenuse of length 1 and one side of length x. Call the acute angle formed by these two sides θ. Then cos θ = x, and so arccos(x) = θ, and sin θ = √(1 − x²). This proves the identity above, but only for 0 < x < 1.

If we make branch cuts along (−∞, −1] and [1, ∞) we can extend arccos(z) uniquely by analytic continuation. We can extend the definition to the branch cuts by continuity, but from one direction. We either have to choose the extension to be continuous from above the branch cuts or from below; we have to choose one or the other because the two limits are not equal. As far as I know, everyone chooses continuity from above, i.e. continuity with quadrant II, by convention.

In any case, we can define arccos(z) for any complex number z, and the result is a number whose cosine is z. Therefore the square of its cosine is z², and the square of its sine is 1 − z². So we have

sin²(arccos(z)) = 1 − z².

But does that mean

sin(arccos(z)) = √(1 − z²)?

Certainly it does if we’re working with real numbers, but does it for complex numbers?

Recall what square root means for complex numbers: it is the analytic function with branch cut (−∞, 0] that agrees with the real square root function along the real line, and is defined along the branch cut to be continuous with quadrant II. Since

sin(arccos(z)) = √(1 − z²)

for real −1 < z < 1, the two sides of the equation are equal on a set with a limit point, and so as analytical functions they are equal on their common domain.

The only remaining detail is whether the two functions are equal along the branch cut (−∞, −1] where we’ve extended the function by continuity. As

Since we’ve defined both the arccos and square root functions by continuous extension from the second quadrant, equality on the branch cut also follows by continuity.

llms

sinh( arccosh(x) )

I’ve written several posts about applying trig functions to inverse trig functions. I intended to write two posts, one about the three basic trig functions and one about their hyperbolic counterparts. But there’s more to explore here than I thought at first. For example, the mistakes that I made in the first post lead to a couple more posts discussing error detection and proofs.

I was curious about how Mathematica would handle these identities. Sometimes it doesn’t simplify expressions the way you expect, and for interesting reasons. It handled the circular functions as you might expect.

\renewcommand{\arraystretch}{2.2} \begin{array}{c|c|c|c} & \sin^{-1} & \cos^{-1} & \tan^{-1} \\ \hline \sin & x & \sqrt{1-x^{2}} & \dfrac{x}{\sqrt{1+x^2}} \\ \hline \cos & \sqrt{1-x^{2}} & x & \dfrac{1}{\sqrt{1 + x^2}} \\ \hline \tan & \dfrac{x}{\sqrt{1-x^{2}}} & \dfrac{\sqrt{1-x^{2}}}{x} & x \\ \end{array}

So, for example, if you enter Sin[ArcCos[x]] it returns √(1 − x²) as in the table above. Then I added an h on the end of all the function names to see whether it would reproduce the table of hyperbolic compositions.

\renewcommand{\arraystretch}{2.2} \begin{array}{c|c|c|c} & \sinh^{-1} & \cosh^{-1} & \tanh^{-1} \\ \hline \sinh & x & \sqrt{x^{2}-1} & \dfrac{x}{\sqrt{1-x^2}} \\ \hline \cosh & \sqrt{x^{2} + 1} & x & \dfrac{1}{\sqrt{1 - x^2}} \\ \hline \tanh & \dfrac{x}{\sqrt{x^{2}+1}} & \dfrac{\sqrt{x^{2}-1}}{x} & x \\ \end{array}

For the most part it did, but not entirely. The results were as expected except when applying sinh or cosh to arccosh. But Sinh[ArcCosh[x]] returns

\sqrt{\frac{x-1}{x+1}} (x+1)

and Tanh[ArcCosh[x]] returns

\frac{\sqrt{\frac{x-1}{x+1}} (x+1)}{x}

Why doesn’t Mathematica simplify as expected?

Why didn’t Sinh[ ArcCosh[x] ] just return √(x² − 1)? The expression it returned is equivalent to this: just square the (x + 1) term, bring it inside the radical, and simplify. That line of reasoning is correct for some values of x but not for others. For example, Sinh[ArcCosh[2]] returns −√3 but √(2² − 1) = √3. The expression Mathematica returns for Sinh[ArcCosh[x]] correctly evaluates to −√3.

Defining ArcCosh

To understand what’s going on, we have to look closer at what arccosh(x) means. You might say it is a function that returns the number whose hyperbolic cosine equals x. But cosh is an even function: cosh(−x) = cosh(x), so we can’t say the value. OK, so we define arccosh(x) to be the positive number whose hyperbolic cosine equals x. That works for real values of x that are at least 1. But what do we mean by, for example, arccosh(1/2)? There is no real number y such that cosh(y) = 1/2.

To rigorously define inverse hyperbolic cosine, we need to make a branch cut. We cannot define arccosh as an analytic function over the entire complex plane. But if we remove (−∞, 1], we can. We define arccosh(x) for real x > 1 to be the positive real number y such that cosh(y) = x, and define it for the rest of the complex plane (with our branch cut (−∞, 1] removed) by analytic continuation.

If we look up ArcCosh in Mathematica’s documentation, it says “ArcCosh[z] has a branch cut discontinuity in the complex z plane running from −∞ to +1.” But what about values of x that lie on the branch cut? For example, we looked at ArcCosh[-2] above. We can extend arccosh to the entire complex plane, but we cannot extend it as an analytic function.

So how do we define arccosh(x) for x in (−∞, 1]? We could define it to be the limit of arccosh(z) as z approaches x for values of z not on the branch cut. But we have to make a choice: do we approach x from above or from below? That is, we can define arccosh(x) for real x ≤ 1 by

\text{arccosh}(x) = \lim_{\varepsilon \to 0^+} \text{arccosh}(x + \varepsilon i)

or by

\text{arccosh}(x) = \lim_{\varepsilon \to 0^-} \text{arccosh}(x + \varepsilon i)

but we have to make a choice because the two limits are not the same. For example, ArcCosh[-2 + 0.001 I] returns 1.31696 + 3.14102 I but ArcCosh[-2 - 0.001 I] returns 1.31696 - 3.14102 I. By convention, we choose the limit from above.

Defining square root

Where did we go wrong when we assumed Mathematica’s expression for sinh(arccosh(x))

\sqrt{\frac{x-1}{x+1}} (x+1)

could be simplified to √(x² − 1)? We implicitly assumed √(x + 1)² = (x + 1). And that’s true, if x ≥ − 1, but not for smaller x. Just as we have be careful about how we define arccosh, we have to be careful about how we define square root.

The process of defining the square root function for all complex numbers is analogous to the process of defining arccosh. First, we define square root to be what we expect for positive real numbers. Then we make a branch cut, in this case (−∞, 0]. Then we define it by analytic continuation for all values not on the cut. Then finally, we define it along the cut by continuity, taking the limit from above.

Once we’ve defined arccosh and square root carefully, we can see that the expressions Mathematica returns for sinh(arccosh(x)) and tanh(arccosh(x)) are correct for all complex inputs, while the simpler expressions in the table above implicitly assume we’re working with values of x for which arccosh(x) is real.

Making assumptions explicit

If we are only concerned with values of x ≥ − 1 we can tell Mathematica this, and it will simplify expressions accordingly. If we ask it for

    Simplify[Sinh[ArcCosh[x]], Assumptions -> {x >= -1}]

it will return √(x² − 1).

Related posts