Two useful asymptotic series

This post will present a couple asymptotic series, explain how to use them, and point to some applications.

The most common special function in applications, at least in my experience, is the gamma function Γ(z). It’s often easier to work with the logarithm of the gamma function than to work with the gamma function directly, so one of the asymptotic series we’ll look at is the series for log Γ(z).

Another very common special function in applications is the error function erf(z). The error function and its relationship to the normal probability distribution are explained here. Even though the gamma function is more common, we’ll start with the asymptotic series for the error function because it is a little simpler.

Error function

We actually present the series for the complementary error function erfc(z) = 1 – erf(z). (Why have a separate name for erfc when it’s so closely related to erf? Sometimes erfc is easier to work with mathematically. Also, sometimes numerical difficulties require separate software for evaluating erf and erfc as explained here.)

sqrt{pi} z e^{z^2} mbox{erfc}(z) sim 1 + sum_{m=1}^infty (-1)^m frac{(2m-1)!!}{(2z^2)^{m}}

If you’re unfamiliar with the n!! notation, see this explanation of double factorial.

Note that the series has a squiggle ~ instead of an equal sign. That is because the partial sums of the right side do not converge to the left side. In fact, the partial sums diverge for any z. Instead, if you take any fixed partial sum you obtain an approximation of the left side that improves as z increases.

The series above is valid for any complex value of z as long as |arg(z)| < 3π/4. However, the error term is easier to describe if z is real. In that case, when you truncate the infinite sum at some point, the error is less than the first term that was left out. In fact, the error also has the same sign as the first term left out. So, for example, if you drop the sum entirely and just keep the “1” term on the right side, the error is negative and the absolute value of the error is less than 1/2 z2.

One way this series is used in practice is to bound the tails of the normal distribution function. A slight more involved application can be found here.

Log gamma

The next series is the asymptotic series for log Γ(z).

log Gamma(z) sim (z - frac{1}{2}) log z - z + frac{1}{2} log(2pi) + frac{1}{12z} - frac{1}{360z^3} + cdot

If you throw out all the terms involving powers of 1/z you get Stirling’s approximation.

As before, the partial sums on the right side diverge for any z, but if you truncate the series on the right, you get an approximation for the left side that improves as z increases. And as before, the series is valid for complex z but the error is simpler when z is real. In this case, complex z must satisfy |arg(z)| < π. If z is real and positive, the approximation error is bounded by the first term left out and has the same sign as the first term left out.

The coefficients 1/12, 1/360, etc. require some explanation. The general series is

log Gamma(z) sim (z - frac{1}{2}) log z - z + frac{1}{2}log 2pi + sum_{m=1}^infty frac{B_{2m}}{2m(2m-1) z^{2m-1}}

where the numbers B2m are Bernoulli numbers. This post showed how to use this asymptotic series to compute log n!.

References

The asymptotic series for the error function is equation 7.1.23 in Abramowitz and Stegun. The bounds on the remainder term are described in section 7.1.24. The series for log gamma is equation 6.1.41 and the error term is described in 6.1.42.

Bug in SciPy’s erf function

Last night I produced the plot below and was very surprised at the jagged spike. I knew the curve should be smooth and strictly increasing.

My first thought was that there must be a numerical accuracy problem in my code, but it turns out there’s a bug in SciPy version 0.8.0b1. I started to report it, but I saw there were similar bug reports and one such report was marked as closed, so presumably the fix will appear in the next release.

The problem is that SciPy’s erf function is inaccurate for arguments with imaginary part near 5.8. For example, Mathematica computes erf(1.0 + 5.7i) as -4.5717×1012 + 1.04767×1012 i. SciPy computes the same value as -4.4370×1012 + 1.3652×1012 i. The imaginary component is off by about 30%.

Here is the code that produced the plot.

from scipy.special import erf
from numpy import linspace, exp
import matplotlib.pyplot as plt

def g(y):
    z = (1 + 1j*y) /  sqrt(2)
    temp = exp(z*z)*(1 - erf(z))
    u, v = temp.real, temp.imag
    return -v / u

x = linspace(0, 10, 101)
plt.plot(x, g(x))

Three surprises with bc

bc is a quirky but useful calculator. It is a standard Unix utility and is also available for Windows.

One nice feature of bc is that you can set the parameter scale to indicate the desired precision. For example, if you set scale=100, all calculations will be carried out to 100 decimal places.

The first surprise is that the default value of scale is 0. So unless you change the default option, 1/2 will return 0. This is not because it is doing integer division: 1.0/2.0 also returns 0. bc is computing 1/2 as 0.5 and displaying the default number of decimal places, i.e. none! Note also that bc doesn’t round results; it truncates.

bc has one option: -l. This option loads the math library and sets the default value of scale to 20. I always fire up bc -l rather than just bc.

The second surprise with bc is that its math library only has five elementary functions. However, you can do a lot with these five functions if you know a few identities.

The sine and cosine of x are computed by s(x) and c(x) respectively. Angles are measured in radians. There is no tangent function in bc. If you want the tangent of x, compute s(x)/c(x). (See here for an explanation of how to compute other trigonometric functions.) As minimal as bc is, it did make a minor concession to convenience: it could have been more minimal by insisting you use sin(π/2 – x) to compute a cosine.

The only inverse trigonometric function is a(x) for arctangent. This function can be bootsrapped to compute other inverse functions via these identities:

arcsin(x) = arctan(x / sqrt(1 – x2))
arccos(x) = arctan(sqrt(1 – x2 )/ x)
arccot(x) = π/2 – arctan(x)
arcsec(x) = arctan(sqrt(x2 – 1))
arccsc(x) = arctan(1/sqrt(x2 – 1))

The functions l(x) and e(x) compute (natural) logarithm and exponential respectively. bc has a power operator ^ but it can only be used for integer powers. So you could compute the fourth power of x with x^4 but you cannot compute the fourth root of x with x^0.25. To compute xy for a floating point value y, use e(l(x)*y). Also, you can use the identity logb(x) = log(x) / log(b) to find logarithms to other bases. For example, you could compute the log base 2 of x using l(x)/l(2).

Not only is bc surprising for the functions it does not contain, such as no tangent function, it is also surprising for what it does contain. The third surprise is that in addition to its five elementary functions, the bc math library has a function j(n,x) t0 compute the nth Bessel function of x where n is an integer. (You can pass in a floating point value of n but bc will lop off the fractional part.)

I don’t know the history of bc, but it seems someone must have needed Bessel functions and convinced the author to add them. Without j, the library consists entirely of elementary functions of one argument and the names of the functions spell out “scale.” The function j breaks this pattern.

If I could include one advanced function in a calculator, it would be the gamma function, not Bessel functions. (Actually, the logarithm of the gamma function is more useful than the gamma function itself, as I explain here.) Bessel functions are important in applications but I would expect more demand for the gamma function.

Update: Right after I posted this, I got an email saying bc -l was following me on Twitter. Since when do Unix commands have Twitter accounts? Well,  Hilary Mason has created a Twitter account bc_l that will run bc -l on anything you send it via DM.

Update (November 14, 2011): The account bc_l is currently not responding to requests. Hilary says that the account stopped working when Twitter changed the OAuth protocol. She says she intends to repair it soon.

Related posts:

How many trig functions are there?
Diagram of Bessel function relationships

Math library functions that seem unnecessary

This post will give several examples of functions include in the standard C math library that seem unnecessary at first glance.

Function log1p(x) = log(1 + x)

The function log1p computes log(1 + x).  How hard could this be to implement?

log(1 + x);

Done.

But wait. What if x is very small? If x is 10-16, for example, then on a typical system 1 + x = 1 because machine precision does not contain enough significant bits to distinguish 1 + x from 1. (For details, see Anatomy of a floating point number.) That means that the code log(1 + x) would first compute 1 + x, obtain 1, then compute log(1), and return 0. But log(1 + 10-16) ≈ 10-16. This means the absolute error is about 10-16 and the relative error is 100%. For values of x larger than 10-16 but still fairly small, the code log(1 + x) may not be completely inaccurate, but the relative error may still be unacceptably large.

Fortunately, this is an easy problem to fix. For small x, log(1 + x) is approximately x. So for very small arguments, just return x. For larger arguments, compute log(1 + x) directly. See details here.

Why does this matter? The absolute error is small, even if the code returns a zero for a non-zero answer. Isn’t that OK? Well, it might be. It depends on what you do next. If you add the result to a large number, then the relative error in the answer doesn’t matter. But if you multiply the result by a large number, your large relative error becomes a large absolute error as well.

Function expm1(x) = exp(x) – 1

Another function that may seem unnecessary is expm1. This function computes ex – 1. Why not just write this?

exp(x) - 1.0;

That’s fine for moderately large x. For very small values of x, exp(x) is close to 1, maybe so close to 1 that it actually equals 1 to machine precision. In that case, the code exp(x) - 1 will return 0 and result in 100% relative error. As before, for slightly larger values of x the naive code will not be entirely inaccurate, but it may be less accurate than needed. The solution is to compute exp(x) – 1 directly without first computing exp(x). The Taylor series for exp(x) is 1 + x + x2/2 … So for very small x, we could just return x for exp(x) – 1. Or for slightly larger x, we could return x + x2/2. (See details here.)

Functions erf(x) and erfc(x)

The C math library contains a pair of functions erf and erfc. The “c” stands for “complement” because erfc(x) = 1 – erf(x). The function erf(x) is known as the error function and is not trivial to implement. But why have a separate routine for erfc? Isn’t it trivial to implement once you have code for erf? For some values of x, yes, this is true. But if x is large, erf(x) is near 1, and the code 1 - erf(x) may return 0 when the result should be small but positive. As in the examples above, the naive implementation results in a complete loss of precision for some values and a partial loss of precision for other values.

Functions Γ(x) and log Γ(x)

The standard C math library has two functions related to the gamma function: tgamma that returns Γ(x) and lgamma that return log Γ(x). Why have both? Why can’t the latter just use the log of the former? The gamma function grows extremely quickly. For moderately large arguments, its value exceeds the capacity of a computer number. Sometimes you need these astronomically large numbers as intermediate results. Maybe you need a moderate-sized number that is the ratio of two very large numbers. (See an example here.) In such cases, you need to subtract lgamma values rather than take the ratio of tgamma values. (Why is the function called “tgamma” and not just “gamma”? See the last paragraph and first comment on this post.)

Conclusion

The standard C math library distills a lot of experience. Some of the functions may seem unnecessary, and so they are for some arguments. But for other arguments these functions are indispensable.

Related posts

Stand-alone error function erf(x)

The question came up on StackOverflow this morning how to compute the error function erf(x) in Python. The standard answer for how to compute anything numerical in Python is “Look in SciPy.” However, this person didn’t want to take on the dependence on SciPy. I’ve seen variations on this question come up in several different contexts lately, including questions about computing the normal distribution function, so I thought I’d write up a solution.

Here’s a Python implementation of erf(x) based on formula 7.1.26 from A&S. The maximum error is below 1.5 × 10-7.

import math

def erf(x):
    # constants
    a1 =  0.254829592
    a2 = -0.284496736
    a3 =  1.421413741
    a4 = -1.453152027
    a5 =  1.061405429
    p  =  0.3275911

    # Save the sign of x
    sign = 1
    if x < 0:
        sign = -1
    x = abs(x)

    # A & S 7.1.26
    t = 1.0/(1.0 + p*x)
    y = 1.0 - (((((a5*t + a4)*t) + a3)*t + a2)*t + a1)*t*math.exp(-x*x)

    return sign*y

This problem is typical in two ways: A&S has a solution, and you’ve got to know a little background before you can use it.

The formula given in A&S is only good for x ≥ 0. That’s no problem if you know that the error function is an odd function, i.e. erf(-x) = -erf(x). But if you’re an engineer who has never heard of the error function but needs to use it, it may take a while to figure out how to handle negative inputs.

One other thing that someone just picking up A&S might not know is the best way to evaluate polynomials. The formula appears as 1 – (a1t1 + a2t2 + a3t3 + a4t4 + a5t5)exp(-x2), which is absolutely correct. But directly evaluating an nth order polynomial takes O(n2) operations, while the factorization used in the code above uses O(n) operations. This technique is known as Horner’s method.

The gamma function

The gamma function Γ(x) is the most important function not on a calculator. It comes up constantly in math. In some areas, such as probability and statistics, you will see the gamma function more often than other functions that are on a typical calculator, such as trig functions.

The gamma function extends the factorial function to real numbers. Since factorial is only defined on non-negative integers, there are many ways you could define factorial that would agree on the integers and disagree elsewhere. But everyone agrees that the gamma function is “the” way to extend factorial. Actually, the gamma function Γ(x) does not extend factorial, but Γ(x+1) does. Shifting the definition over by one makes some equations simpler, and that’s the definition that has caught on.

In a sense, Γ(x+1) is the unique way to generalize factorial. Harald Bohr and Johannes Mollerup proved that it is the only log-convex function that agrees with factorial on the non-negative integers. That’s somewhat satisfying, except why should we look for log-convex functions? Log-convexity is very useful property to have, and a natural one for a function generalizing the factorial.

Here’s a plot of the logarithms of the first few factorial values.

The points nearly fall on a straight line, but if you look closely, the points bend upward slightly. If you have trouble seeing this, imagine a line connecting the first and last dot. This line would lie above the dots in between. This suggests that a function whose graph passes through all the dots should be convex.

Here’s what a graph showing what the gamma function looks like over the real line.

The gamma function is finite except for non-positive integers. It goes to +∞ at zero and negative even integers and to -∞ at negative odd integers. The gamma function can also be uniquely extended to an analytic function on the complex plane. The only singularities are the poles on the real axis.

Related post: generalizing binomial coefficients

Error function and the normal distribution

The error function erf(x) and the normal distribution Φ(x) are essentially the same function. The former is more common in math, the latter in statistics. I often have to convert between the two.

It’s a simple exercise to move between erf(x) and Φ(x), but it’s tedious and error-prone, especially when you throw in variations on these two functions such as their complements and inverses. Some time ago I got sufficiently frustrated to write up the various relationships in a LaTeX file for future reference. I was using this file yesterday and thought I should post it as a PDF file in case it could save someone else time and errors.

Orthogonal polynomials

This morning I posted some notes on orthogonal polynomials and Gaussian quadrature.

“Orthogonal” just means perpendicular. So how can two polynomials be perpendicular to each other? In geometry, two vectors are perpendicular if and only if their dot product of their coordinates is zero. In more general settings, two things are said to be orthogonal if their inner product (generalization of dot product) is zero. So what was a theorem in basic geometry is taken as a definition in other settings. Typically mathematicians say “orthogonal” rather than “perpendicular.” The basic idea of lines meeting at right angles acts as a reliable guide to intuition in more general settings.

Two polynomials are orthogonal if their inner product is zero. You can define an inner product for two functions by integrating their product, sometimes with a weighting function.

Orthogonal polynomials have remarkable properties that are easy to prove. Last week I posted some notes on Chebyshev polynomials. The notes posted today include Chebyshev polynomials as a special case and focus on the application of orthogonal polynomials to quadrature. (“Quadrature” is just an old-fashioned word for integration, usually applied to numerical integration in one dimension.) It turns out that every class of orthogonal polynomials corresponds to an integration rule.

Chebyshev polynomials

I posted a four-page set of notes on Chebyshev polynomials on my web site this morning. These polynomials have many elegant properties that are simple to prove. They’re also useful in applications.

Mr. Chebyshev may have the honor of the most variant spellings for a mathematician’s name. I believe “Chebyshev” is now standard, but his name has been transliterated from the Russian as Chebychev, Chebyshov, Tchebycheff, Tschebyscheff, etc. His polynomials are denoted Tn(x) based on his initial in one of the older transliterations.