# Giant components and the Lambert W-function

The Lambert W-function is the function w(z) implicitly defined by w exp(w) = z. When I first saw this, I thought that I’d seen something like this come up many times and that it would be really handy to know that this function has a name and has many software implementations1. Since then, however, I’ve hardly ever run into the W-function in application. But here’s an application I ran into recently.

A “giant component” in a random network is a component whose size grows in proportion to the size of the network. For a Poisson random network, let c = p(n – 1) be the expected number of vertices connected to any given vertex. If c > 1 then as n goes to infinity, there will be a giant component with probability 1. S, the proportion of nodes inside the a giant component, satisfies the equation

S = 1 – exp(-cS).

S plays a role similar to that of w in the definition of the W-function, which suggests the W-function might come in handy. And in fact, this book gives this solution for S:

S = 1 + w( –c exp(-c) )/c.

There are a couple issues. First, it’s not obvious that solution is correct. Second, we need to be more careful about how we define w(z) when z is negative.

Given some value of c, let S = 1 + w( –c exp(-c) )/c. Then

w( –c exp(-c) ) = –c(1 – S).

Apply the function f(x) = x exp( x ) to both sides of the equation above. By the definition of the W-function, we have

c exp(-c) = –c(1 – S) exp( –c(1 – S) ).

From there it easily follows that

S = 1 – exp(-cS).

Now let’s look more carefully at the definition of the W-function. For positive real z, w exp(w) = z has a unique real solution defining w. But in the calculations above, we have evaluated w at –c exp(-c), which is negative since c > 1. For negative arguments, we have to pick a branch of w. Which branch guarantees that S = 1 – exp(-cS)? Any branch2. No matter which solution to w exp(w) = z we pick, the resulting S satisfies the giant component equation. However, the wrong branch might result in a meaningless solution. Since S is a proportion, S should be between 0 and 1.

Let’s go back and say that for our purposes, we will take w(z) to mean the real number no less than -1 satisfying w exp(w) = z. Then for c > 1, the solution S = 1 + w( –c exp(-c) )/c is between 0 and 1. You can show that S = 1 – exp(-cS) has a unique solution for 0 < S < 1, so any other branch would not give a solution in this interval.

1 For example, in Python the Lambert W-function is `scipy.special.lambertw`. The function takes two arguments: z and an integer denoting the branch. By default, the branch argument is set to 0, giving the branch discussed in this post.

2 For -1/e < z < 0, there are two real branches of w(z), but there are also infinitely many complex branches.

# Addition formulas for Bessel functions

When trying to understand a complex formula, it helps to first ask what is being related before asking how they are related.

This post will look at addition theorems for Bessel functions. They related the values of Bessel functions at two points to the values at a third point.

Let a, b, and c be lengths of the sides of a triangle and let θ be the angle between the sides of length a and b. The triangle could be degenerate, i.e. θ could be π. The addition theorems here give the value of Bessel functions at c in terms of values of Bessel functions evaluated at b and c.

The first addition theorem says that

J0(c) = ∑ Jm(a) Jm(b) exp(imθ)

where the sum is over all integer values of m.

This says that the value of the Bessel function J0 at c is determined by a sort of inner product of values of Bessel functions of all integer orders at a and b. It’s not exactly an inner product because it is not positive definite. (Sometimes you’ll see the formula above written with a factor λ multiplying a, b, and c. This is because you can scale every side of a triangle by the same amount and not change the angles.)

Define the vectors x, y, and z to be the values of all Bessel functions evaluated at a, b, and c respectively. That is, for integer k, xk = Jk(a) and similar for y and z. Also, define the vector w by wk = exp(ikθ). Then the first addition theorem says

z0 = ∑ xm ym wm.

This is a little unsatisfying because it relates the value of one particular Bessel function at c to the values of all Bessel functions at a and b. We’d like to relate all Bessel function values at c to the values at a and b. That is, we’d like to relate the whole vector z to the vectors x and y.

Define the vector v by vk = exp(inψ) where ψ is the angle between the sides of length b and c. Then the formula we’re looking for is

zn = v-nxn+m ym wm.

# Math tools for the next 20 years

Igor Carron commented on his blog that

… the mathematical tools that we will use in the next 20 years are for the most part probably in our hands already.

He compares this to progress in treating leukemia: survival rates increased dramatically over the last 40 years, not primarily by developing new drugs, but by better applying drugs that were available in 1970.

I find that plausible. I’ve gotten a lot of mileage out of math that was well known 100 years ago.

Related post: Doing good work with bad tools

# Castles and quantum mechanics

How are castles and quantum mechanics related? One connection is rook polynomials.

The rook is the chess piece that looks like a castle, and used to be called a castle. It can move vertically or horizontally, any number of spaces.

A rook polynomial is a polynomial whose coefficients give the number of ways rooks can be arranged on a chess board without attacking each other. The coefficient of xk in the polynomial Rm,n(x) is the number of ways you can arrange k rooks on an m by n chessboard such that no two rooks are in the same row or column.

The rook polynomials are related to the Laguerre polynomials by

Rm,n(x) = n! xn Lnmn(-1/x)

where Lnk(x) is an “associated Laguerre polynomial.” These polynomials satisfy Laguerre’s differential equation

x y” + (n+1-x) y‘ + k y = 0,

an equation that comes up in numerous contexts in physics. In quantum mechanics, these polynomials arise in the solution of the Schrödinger equation for the hydrogen atom.

# Six analysis and probability diagrams

Here are a few diagrams I’ve created that summarize relationships in analysis and probability. Click on a thumbnail image to go to a page with the full image and explanatory text.

Special functions

Gamma and related functions

Probability distributions

Conjugate priors

Convergence theorems

Bessel functions

# Doubly periodic functions

Functions like sine and cosine are periodic. For example, sin(x + 2πn) = sin(x) for all x and any integer n, and so the period of sine is 2π. But what happens if you look at sine or cosine as functions of a complex variable? They’re still periodic if you shift left or right, but not if you shift up or down. If you move up or down, i.e. in a pure imaginary direction, sines and cosines become unbounded.

Doubly periodic functions are periodic in two directions. Formally, a function f(z) of complex variable is doubly periodic if there exist two constants ω1 and ω2 such that

f(z) = f(z + ω1) = f(z + ω2)

for all z. The ratio ω1 / ω2 cannot be real; otherwise the two periods point in the same direction. For the rest of this post, I’ll assume ω1 = 1 and ω2 = i. Such functions are periodic in the horizontal (real-axis) and vertical (imaginary-axis) directions. They repeat everywhere their behavior on the unit square.

What do doubly periodic functions look like? It depends on what restrictions we place on the functions. When we’re working with complex functions, we’re typically interested in functions that are analytic, i.e. differentiable as complex functions.

Only constant functions can be doubly periodic and analytic everywhere. Why? Our functions take on over and over the values they take on over the unit square. If a function is analytic over the (closed) unit square then it’s bounded over that square, and since it’s doubly periodic, it’s bounded everywhere. By Liouville’s theorem, the only bounded analytic functions are constants.

This says that to find interesting doubly periodic functions, we’ll have to relax our requirements. Instead of requiring functions to be analytic everywhere, we will require them to be analytic except at isolated singularities. That is, the functions are allowed to blow up at a finite number of points. There’s a rich set of such functions, known as elliptic functions.

There are two well-known families of elliptic functions. One is the Weierstrass ℘ function (TeX symbol `wp`, Unicode U+2118) and its derivatives. The other is the Jacobi functions sn, cn, and dn. These functions have names resembling familiar trig functions because the Jacobi functions are in some ways analogous to trig functions.

It turns out that all elliptic functions can be written as combinations either of the Weierstrass function and its derivative or combinations of Jacobi functions. Roughly speaking, Weierstrass functions are easier to work with theoretically and Jacobi functions are easier to work with numerically.

Related: Applied complex analysis

# College math in a single symbol

From Prelude to Mathematics by W. W. Sawyer (1955):

There must be many universities today where 95 percent, if not 100 percent, of the functions studied by physics, engineering, and even mathematics students, are covered by the single symbol F(a, b; c; x).

The symbol Sawyer refers to is the hypergeometric function. (There are hypergeometric functions with any number of parameters, but the case with three parameters is so common that it is often called “the” hypergeometric function.) The most commonly used functions in application — trig functions, exp, log, the error function, Bessel functions, etc. — are either hypergeometric functions or closely related to hypergeometric functions. Sawyer continues:

I do not wish to imply that the hypergeometric function is the only function about which mathematics knows anything. That is far from being true. … but the valley inhabited by schoolboys, by engineers, by physicists, and by students of elementary mathematics, is the valley of the Hypergeometric Function, and its boundaries are (but for one or two small clefts explored by pioneers) virgin rock.

# Diagram of special function relationships

Here’s a first pass at a diagram showing the relationships between special functions that commonly come up in applications.

These notes explain what everything on the diagram means.

Other math diagrams

# Diagram of Bessel function relationships

Bessel functions are interesting and useful, but they’re surrounded by arcane terminology. It may seem that there are endless varieties of Bessel functions, but there are not that many variations and they are simply related to each other.

Each letter in the diagram is taken from the traditional notation for a class of Bessel functions. Functions in the same row satisfy the same differential equation. Functions in the same column have something in common in their naming.

The lines represent simple relations between functions. There’s one line conspicuously missing from the diagram, and that’s no accident.

These notes give the details behind the diagram.

Related post: Visualizing Bessel functions

# Diagram of gamma function identities

The gamma function has a large number of identities relating its values at one point to values at other points. By focusing on just the function arguments and not the details of the relationships, a simple pattern emerges. Most of the identities can be derived from just four fundamental identities:

• Conjugation
• Reflection
• Multiplication

The same holds for functions derived from the gamma function: log gamma, digamma, trigramma, etc.

For the details of the relationships, see Identities for gamma and related functions.

Other diagrams:

# How to visualize Bessel functions

Bessel functions are sometimes called cylindrical functions because they arise naturally from physical problems stated in cylindrical coordinates. Bessel functions have a long list of special properties that make them convenient to use. But because so much is known about them, introductions to Bessel functions can be intimidating. Where do you start? My suggestion is to start with a few properties that give you an idea what their graphs look like.

A textbook introduction would define the Bessel functions and carefully develop their properties, eventually getting to the asymptotic properties that are most helpful in visualizing the functions. Maybe it would be better to learn how to visualize them first, then go back and define the functions and show they look as promised. This way you can have a picture in your head to hold onto as you go over definitions and theorems.

I’ll list a few results that describe how Bessel functions behave for large and small arguments. By putting these two extremes together, we can have a fairly good idea what the graphs of the functions look like.

There are two kinds of Bessel functions, J(z) and Y(z). Actually, there are more, but we’ll just look at these two. These functions have a parameter ν (Greek nu) written as a subscript. The functions Jν(z) are called “Bessel functions of the first kind” and the functions Yν are called … wait for it … “Bessel functions of the second kind.” (Classical mathematics is like classical music as far as personal names followed by ordinal numbers: Beethoven’s fifth symphony, Bessel functions of the first kind, etc.)

Roughly speaking, you can think of J‘s and Y‘s like cosines and sines. In fact, for large values of z, J(z) is very nearly a cosine and Y(z) is very nearly a sine. However, both are shifted by a phase φ that depends on ν and are dampened by 1 /√z . That is,  for large values of z, J(z) is roughly proportional to cos(z – φ)/√z and Y(z) is roughly proportional to sin(z – φ)/√z.

More precisely, as z goes to infinity, we have

and

These statements hold as long as |arg(z)| < π. The error in each is on the order O(1/|z|).

Now lets look at how the functions behave as z goes to zero. For small z, Jν behaves like zν and Yν behaves like z. Specifically,

and for ν > 0,

If ν = 0, we have

Now let’s use the facts above to visualize a couple plots.

First we plot J1 and J5. For large values of z, we expect J1(z) to behave like cos(z – φ) / √z where φ = 3π/4 and we expect J5(z) to behave like cos(z – φ) / √z where φ = 11π/4. The two phases differ by 2π and so the two functions should be nearly in phase.

For small z, we expect J1(z) to be roughly proportional z  and so its graph comes into the origin at an angle. We expect J5(z) to be roughly proportional to z5 and so its graph should be flat near the origin.

The graph below shows that the function graphs look like we might have imagined from the reasoning above. Notice that the amplitude of the oscillations is decreasing like 1/√z.

Next we plot J1 and Y1. For large arguments we would expect these functions to be a quarter period out of phase, just like cosine and sine, since asymptotically J1 is proportional to cos(z – 3π/4) / √z and Y1 is proportional to sin(z – 3π/4) / √z. For small arguments, J1(z) is roughly proportional to z and Y1(z) is roughly proportional to -1/z. The graph below looks as we might expect.

Related posts:

# Two useful asymptotic series

This post will present a couple asymptotic series, explain how to use them, and point to some applications.

The most common special function in applications, at least in my experience, is the gamma function Γ(z). It’s often easier to work with the logarithm of the gamma function than to work with the gamma function directly, so one of the asymptotic series we’ll look at is the series for log Γ(z).

Another very common special function in applications is the error function erf(z). The error function and its relationship to the normal probability distribution are explained here. Even though the gamma function is more common, we’ll start with the asymptotic series for the error function because it is a little simpler.

## Error function

We actually present the series for the complementary error function erfc(z) = 1 – erf(z). (Why have a separate name for erfc when it’s so closely related to erf? Sometimes erfc is easier to work with mathematically. Also, sometimes numerical difficulties require separate software for evaluating erf and erfc as explained here.)

If you’re unfamiliar with the n!! notation, see this explanation of double factorial.

Note that the series has a squiggle ~ instead of an equal sign. That is because the partial sums of the right side do not converge to the left side. In fact, the partial sums diverge for any z. Instead, if you take any fixed partial sum you obtain an approximation of the left side that improves as z increases.

The series above is valid for any complex value of z as long as |arg(z)| < 3π/4. However, the error term is easier to describe if z is real. In that case, when you truncate the infinite sum at some point, the error is less than the first term that was left out. In fact, the error also has the same sign as the first term left out. So, for example, if you drop the sum entirely and just keep the “1” term on the right side, the error is negative and the absolute value of the error is less than 1/2 z2.

One way this series is used in practice is to bound the tails of the normal distribution function. A slight more involved application can be found here.

## Log gamma

The next series is the asymptotic series for log Γ(z).

If you throw out all the terms involving powers of 1/z you get Stirling’s approximation.

As before, the partial sums on the right side diverge for any z, but if you truncate the series on the right, you get an approximation for the left side that improves as z increases. And as before, the series is valid for complex z but the error is simpler when z is real. In this case, complex z must satisfy |arg(z)| < π. If z is real and positive, the approximation error is bounded by the first term left out and has the same sign as the first term left out.

The coefficients 1/12, 1/360, etc. require some explanation. The general series is

where the numbers B2m are Bernoulli numbers. This post showed how to use this asymptotic series to compute log n!.

## References

The asymptotic series for the error function is equation 7.1.23 in Abramowitz and Stegun. The bounds on the remainder term are described in section 7.1.24. The series for log gamma is equation 6.1.41 and the error term is described in 6.1.42.

# Bug in SciPy’s erf function

Last night I produced the plot below and was very surprised at the jagged spike. I knew the curve should be smooth and strictly increasing.

My first thought was that there must be a numerical accuracy problem in my code, but it turns out there’s a bug in SciPy version 0.8.0b1. I started to report it, but I saw there were similar bug reports and one such report was marked as closed, so presumably the fix will appear in the next release.

The problem is that SciPy’s `erf` function is inaccurate for arguments with imaginary part near 5.8. For example, Mathematica computes erf(1.0 + 5.7i) as -4.5717×1012 + 1.04767×1012 i. SciPy computes the same value as -4.4370×1012 + 1.3652×1012 i. The imaginary component is off by about 30%.

Here is the code that produced the plot.

```from scipy.special import erf
from numpy import linspace, exp
import matplotlib.pyplot as plt

def g(y):
z = (1 + 1j*y) /  sqrt(2)
temp = exp(z*z)*(1 - erf(z))
u, v = temp.real, temp.imag
return -v / u

x = linspace(0, 10, 101)
plt.plot(x, g(x))
```

# Three surprises with bc

`bc` is a quirky but useful calculator. It is a standard Unix utility and is also available for Windows.

One nice feature of `bc` is that you can set the parameter `scale` to indicate the desired precision. For example, if you set `scale=100`, all calculations will be carried out to 100 decimal places.

The first surprise is that the default value of `scale` is 0. So unless you change the default option, 1/2 will return 0. This is not because it is doing integer division: 1.0/2.0 also returns 0. `bc` is computing 1/2 as 0.5 and displaying the default number of decimal places, i.e. none! Note also that `bc` doesn’t round results; it truncates.

`bc` has one option: `-l`. This option loads the math library and sets the default value of `scale` to 20. I always fire up `bc -l` rather than just `bc`.

The second surprise with `bc` is that its math library only has five elementary functions. However, you can do a lot with these five functions if you know a few identities.

The sine and cosine of `x` are computed by `s(x)` and `c(x)` respectively. Angles are measured in radians. There is no tangent function in `bc`. If you want the tangent of `x`, compute `s(x)/c(x)`. (See here for an explanation of how to compute other trigonometric functions.) As minimal as `bc` is, it did make a minor concession to convenience: it could have been more minimal by insisting you use sin(π/2 – x) to compute a cosine.

The only inverse trigonometric function is `a(x)` for arctangent. This function can be bootsrapped to compute other inverse functions via these identities:

arcsin(x) = arctan(x / sqrt(1 – x2))
arccos(x) = arctan(sqrt(1 – x2 )/ x)
arccot(x) = π/2 – arctan(x)
arcsec(x) = arctan(sqrt(x2 – 1))
arccsc(x) = arctan(1/sqrt(x2 – 1))

The functions `l(x)` and `e(x)` compute (natural) logarithm and exponential respectively. `bc` has a power operator `^` but it can only be used for integer powers. So you could compute the fourth power of x with `x^4` but you cannot compute the fourth root of x with `x^0.25`. To compute xy for a floating point value y, use `e(l(x)*y)`. Also, you can use the identity logb(x) = log(x) / log(b) to find logarithms to other bases. For example, you could compute the log base 2 of x using `l(x)/l(2)`.

Not only is `bc` surprising for the functions it does not contain, such as no tangent function, it is also surprising for what it does contain. The third surprise is that in addition to its five elementary functions, the `bc` math library has a function `j(n,x)` t0 compute the nth Bessel function of x where n is an integer. (You can pass in a floating point value of n but `bc` will lop off the fractional part.)

I don’t know the history of `bc`, but it seems someone must have needed Bessel functions and convinced the author to add them. Without `j`, the library consists entirely of elementary functions of one argument and the names of the functions spell out “scale.” The function `j` breaks this pattern.

If I could include one advanced function in a calculator, it would be the gamma function, not Bessel functions. (Actually, the logarithm of the gamma function is more useful than the gamma function itself, as I explain here.) Bessel functions are important in applications but I would expect more demand for the gamma function.

Update: Right after I posted this, I got an email saying `bc -l` was following me on Twitter. Since when do Unix commands have Twitter accounts? Well,  Hilary Mason has created a Twitter account bc_l that will run `bc -l` on anything you send it via DM.

Update (November 14, 2011): The account `bc_l` is currently not responding to requests. Hilary says that the account stopped working when Twitter changed the OAuth protocol. She says she intends to repair it soon.

Related posts:

# Math library functions that seem unnecessary

This post will give several examples of functions include in the standard C math library that seem unnecessary at first glance.

## Function log1p(x) = log(1 + x)

The function `log1p` computes log(1 + x).  How hard could this be to implement?

`log(1 + x);`

Done.

But wait. What if x is very small? If x is 10-16, for example, then on a typical system 1 + x = 1 because machine precision does not contain enough significant bits to distinguish 1 + x from 1. (For details, see Anatomy of a floating point number.) That means that the code `log(1 + x)` would first compute 1 + x, obtain 1, then compute log(1), and return 0. But log(1 + 10-16) ≈ 10-16. This means the absolute error is about 10-16 and the relative error is 100%. For values of x larger than 10-16 but still fairly small, the code `log(1 + x)` may not be completely inaccurate, but the relative error may still be unacceptably large.

Fortunately, this is an easy problem to fix. For small x, log(1 + x) is approximately x. So for very small arguments, just return x. For larger arguments, compute log(1 + x) directly. See details here.

Why does this matter? The absolute error is small, even if the code returns a zero for a non-zero answer. Isn’t that OK? Well, it might be. It depends on what you do next. If you add the result to a large number, then the relative error in the answer doesn’t matter. But if you multiply the result by a large number, your large relative error becomes a large absolute error as well.

## Function expm1(x) = exp(x) – 1

Another function that may seem unnecessary is `expm1`. This function computes ex – 1. Why not just write this?

`exp(x) - 1.0;`

That’s fine for moderately large x. For very small values of x, exp(x) is close to 1, maybe so close to 1 that it actually equals 1 to machine precision. In that case, the code `exp(x) - 1` will return 0 and result in 100% relative error. As before, for slightly larger values of x the naive code will not be entirely inaccurate, but it may be less accurate than needed. The solution is to compute exp(x) – 1 directly without first computing exp(x). The Taylor series for exp(x) is 1 + x + x2/2 … So for very small x, we could just return x for exp(x) – 1. Or for slightly larger x, we could return x + x2/2. (See details here.)

## Functions erf(x) and erfc(x)

The C math library contains a pair of functions `erf` and `erfc`. The “c” stands for “complement” because erfc(x) = 1 – erf(x). The function erf(x) is known as the error function and is not trivial to implement. But why have a separate routine for `erfc`? Isn’t it trivial to implement once you have code for `erf`? For some values of x, yes, this is true. But if x is large, erf(x) is near 1, and the code `1 - erf(x)` may return 0 when the result should be small but positive. As in the examples above, the naive implementation results in a complete loss of precision for some values and a partial loss of precision for other values.

## Functions Γ(x) and log Γ(x)

The standard C math library has two functions related to the gamma function: `tgamma` that returns Γ(x) and `lgamma` that return log Γ(x). Why have both? Why can’t the latter just use the log of the former? The gamma function grows extremely quickly. For moderately large arguments, its value exceeds the capacity of a computer number. Sometimes you need these astronomically large numbers as intermediate results. Maybe you need a moderate-sized number that is the ratio of two very large numbers. (See an example here.) In such cases, you need to subtract `lgamma` values rather than take the ratio of `tgamma` values. (Why is the function called “tgamma” and not just “gamma”? See the last paragraph and first comment on this post.)

## Conclusion

The standard C math library distills a lot of experience. Some of the functions may seem unnecessary, and so they are for some arguments. But for other arguments these functions are indispensable.