From the category archives:

Math

Two useful asymptotic series

by John on September 7, 2010

This post will present a couple asymptotic series, explain how to use them, and point to some applications.

[click to continue...]

{ 1 comment }

How to compute log factorial

by John on August 16, 2010

Factorials grow very quickly, so quickly the results can easily exceed the limits of computer arithmetic. For example, 30! is too big to store in a typical integer and 200! is too big to store (even approximately) in a standard floating point number. It’s usually more convenient to work with the logarithms of factorials rather than factorials themselves.

So how might you compute log( n! )? You don’t want to compute n! first and then take logs because you’ll overflow for moderately large arguments. You want to compute the log factorial directly.

Since n! = 1 × 2 × 3 × … × n, log(n!) = log(1) + log(2) + log(3) + … + log(n). That gives a way to compute log(n!), but it’s slow for large arguments: the run time is proportional to the size of n.

If you only need to compute log(n!) for n within a moderate range, you could just tabulate the values. Calculate log(n!) for n = 1, 2, 3, …, N by any means, no matter how slow, and save the results in an array. Then at runtime, just look up the result.

But suppose you want to be able to compute log(n!) for any value of n such that log(n!) won’t overflow. That requires a very large value of n. Since log(n!) is on the order of n log (n) for large n, log(n!) won’t overflow even for some very large values of n. You’ll run out of memory to store your table before you’ll run out of values of n such that log(n!) doesn’t overflow.

So now the problem becomes how to evaluate log(n!) for large values of n. Say we tabulate log(n!) for n up to some size and then use a formula to calculate log(n!) for larger arguments. I’m going to switch notation now and work with the gamma function Γ(x) because most references state their results in terms of the gamma function rather than in terms of factorial. It’s easy to move between the two since Γ(n+1) = n!.

Stirling’s approximation says that

log Γ(x) ≈ (x – 1/2) log(x) – x  + (1/2) log(2 π)

with an error on the order of 1/12x. So if n were around 1000, the approximation would be good to about four decimal places. What if we wanted more accuracy but not a bigger table? Stirling’s approximation above is part of an infinite series of approximations, an asymptotic series for log Γ(x):

log Γ(x) ≈ (x – 1/2) log(x) – x + (1/2) log(2 π) + 1/(12 x) – 1/(360 x3) + 1/(1260 x5) – …

This raises a couple questions.

  1. What is the form of a general term in the series?
  2. How accurate is the series when we truncate it at some point?

The answer to the first question is the term with x2m-1 in the denominator has coefficient B2m / (2m(2m-1)) where the B’s are Bernoulli numbers. Perhaps that’s not very satisfying, but that’s what it is.

Now to the question of accuracy. If you’ve never used asymptotic series before, you might be tempted to use one like you’d use a Taylor series: the more terms the better. But asymptotic series don’t work that way. Typically the error improves at first as you take more terms, reaches a minimum, then increases as you take more terms. Another difference is that while Taylor series approximations improve as arguments get smaller, asymptotic approximations improve as arguments get larger. That’s convenient for us since we’re looking for an approximation for n so large that it’s beyond our table of saved values.

For this particular series, the absolute value of the error in truncating the series is less than the absolute value of the first term that was left out.  Suppose we make a look-up table for the values 1 through 256. If we truncate the series after 1/(12 x), the error will be less than 1/(360 x3). If x > 256, log(x!) > 1419 and the error in the asymptotic approximation is less than 1/(360×224) = 1.65 × 10-10. Since the number we’re computing has at least four digits and the result is good to 10 decimal places, we should have at least 14 significant figures, near the limits of floating point precision. (For more details, see Anatomy of a floating point number.)

In summary, one way to compute log factorial is to pre-compute log(n!) for n = 1, 2, 3, … 256 and store the results in an array. For values of n ≤ 256, look up the result from the table. For n > 256, return

(x – 1/2) log(x) – x + (1/2) log(2 π) + 1/(12 x)

with x = n + 1. This has been coded up here.

You could include the 1/(360 x3) term or higher terms from the asymptotic series and use a smaller table. This would use less memory but would require more computation for arguments outside the range of the table.

Related posts:

How to calculate binomial probabilities
Math library functions that seem unnecessary

{ 5 comments }

Rosenbrock’s banana function

by John on July 28, 2010

Rosenbrock’s banana function is a famous test case for optimization software. It’s called the banana function because of its curved contours.

The definition of the function is

f(x, y) = (1 - x)^2 + 100(y - x^2)^2

The function has a global minimum at (1, 1). If an optimization method starts at the point (-1.2, 1), it has to find its way to the other side of a flat, curved valley to find the optimal point.

Rosenbrock’s banana function is just one of the canonical test functions in the paper “Testing Unconstrained Optimization Software” by Moré, Garbow, and Hillstrom in ACM Transactions on Mathematical Software Vol. 7, No. 1, March 1981, pp 17-41. The starting point (-1.2, 1) mentioned above comes from the starting point required in that paper.

I mentioned a while back that I was trying out Python alternatives for tasks I have done in Mathematica. I tried making contour plots with Python using matplotlib. This was challenging to figure out. The plots did not look very good, though I imagine I could have made satisfactory plots if I had explored the options. Instead, I fired up Mathematica and produced the plot above easily using the following code.

Banana[x_, y_] := (1 - x)^2 + 100 (y - x*x)^2
ContourPlot[Banana[x, y], {x, -1.5, 1.5}, {y, 0.7, 2.0}]

{ 4 comments }

Sine approximation for small angles

by John on July 27, 2010

For small angles, sin(θ) is approximately θ. This post takes a close look at this familiar approximation.

I was confused when I first heard that sin(θ) ≈ θ for small θ. My thought was “Of course they’re approximately equal. All small numbers are approximately equal to each other.” I also was confused by the footnote “The angle θ must be measured in radians.” If the approximation is just saying that small angles have small sines, it doesn’t matter whether you measure in radians or degrees. I missed the point, as I imagine most students do. I run into people who have completed math or science degrees without understanding this tidbit from their freshman year.

The point is that the error in the approximation sin(θ) ≈ θ is small, even relative to the size of θ. If θ is small, the error in the approximation sin(θ) ≈ θ is really, really small. I use the phrase “really, really” in a precise sense; I’m not just talking like an eight-year-old child. If θ is small, θ2 is really small and θ3 is really, really small.

For small values of θ, say θ < 1, the difference between sin(θ) and θ is very nearly θ3/6 and so the absolute error is really, really small. The relative error is about θ2/6. If the absolute error is really, really small, the relative error is really small.

The approximation sin(θ) ≈ θ comes from a Taylor series, but there’s more going on than that. For example, the approximation log(1 + x) ≈ x also comes from a Taylor series, but that approximation is not nearly as accurate.

The Taylor series for log(1 + x) is

xx2/2 + x3/3 – …

This means that for small x, log(1 + x) is approximately x, and the error in the approximation is roughly x2/2. So if x is small, the absolute error in the approximation is really small and the relative error is small. Notice this is one less “really” than the analogous statement about sines.

The reason the sine approximation is one “really” better than the log approximation is that sine is an odd function, so its Taylor series has only odd terms. The error is approximately the first term left out. The series for sin(θ) is

θ – θ3 / 3! + θ5/5! – …

so sin(θ) – θ is on the order of θ3, not just θ2.

We can say even more. Not only is the error in sin(θ) ≈ θ approximately θ3/6, it is in fact bounded by θ3/6 for small, positive θ. This comes from the alternating series theorem. When you make an approximation from an alternating Taylor series, the error is bounded by the first term you leave out. The argument has to be small enough that the terms of the series decrease in absolute value. For example, if you  stick  θ = 1 into the series for sine, each term of the series is smaller than the previous one. But if you stick in θ = 2 then the terms get larger before they get smaller. So the precise meaning of “small enough” is small enough that the terms of the alternating series decrease in absolute value.

The series for log(1 + x) also alternates, so the error estimate is also an upper bound on the error in that case. Not all series are alternating series. But when we do have an alternating series, we get a bonus as far as being able to control the error in approximations.

It’s also true that for small θ, cos(θ) is approximately 1. At first this may seem analogous to saying that for small θ, sin(θ) is approximately 0. While both are true, the former is much better.

The series for cosine is

1 – θ2/2! + θ4/4! – …

and so the error in approximating cos(θ) with 1 is approximately θ2/2. In fact, since the series alternates, the error is bound by θ2/2, if θ is small enough. And in this case the relative error is approximately equal to the absolute error. This says that for small θ, the approximation cos(θ) ≈ 1 is really, really good. By contrast, saying that sin(θ) ≈ 0 for small θ is a bad approximation: the absolute error is approximately θ and the relative error is 100%. Rounding small values of θ down to 0 produces a very good approximation in one context and a poor approximation in another context.

Related post:

Sales tax included
Economizing approximations
Simple approximation to normal distribution

{ 8 comments }

I heard the terms “covariance” and “contravariance” used in math long before I heard them used in object oriented programming.  I was curious whether there was any connection between the two. To my surprise, they’re very similar. In fact, you could formalize the OOP use of the terms so that they’re not just analogous but actually  special cases of the mathematical terms.

When I started writing this post, I intended to explain covariance and contravariance. However, the post became longer and more technical than what I like to write here. Instead, I just announce that a connection exists and give references for those who want to read further.

Chris Burrows describes covariance and contravariance in object oriented programming in his article New C# Features in the .NET Framework 4.

Wikipedia has a short but readable introduction to category theory including covariant and contravariant functors. See also A Categorical Manifesto (PostScript file). The terms covariant and contravariant were defined in category theory before computer scientists applied the terms to object oriented programming.

Computer scientists have been interested in category theory for some time, so it’s not too surprising that category theory terms would filter down into practical programming. The real surprise was hearing category terminology used outside of math. It was like the feeling you get when you run into a coworker at a family reunion or a neighbor at a restaurant in another city.

Related post:

My mathematical opposite

{ 3 comments }

Three surprises with bc

by John on July 14, 2010

bc is a quirky but useful calculator. It is a standard Unix utility and is also available for Windows.

One nice feature of bc is that you can set the parameter scale to indicate the desired precision. For example, if you set scale=100, all calculations will be carried out to 100 decimal places.

The first surprise is that the default value of scale is 0. So unless you change the default option, 1/2 will return 0. This is not because it is doing integer division: 1.0/2.0 also returns 0. bc is computing 1/2 as 0.5 and displaying the default number of decimal places, i.e. none! Note also that bc doesn’t round results; it truncates.

bc has one option: -l. This option loads the math library and sets the default value of scale to 20. I always fire up bc -l rather than just bc.

The second surprise with bc is that its math library only has five elementary functions. However, you can do a lot with these five functions if you know a few identities.

The sine and cosine of x are computed by s(x) and c(x) respectively. Angles are measured in radians. There is no tangent function in bc. If you want the tangent of x, compute s(x)/c(x). (See here for an explanation of how to compute other trigonometric functions.) As minimal as bc is, it did make a minor concession to convenience: it could have been more minimal by insisting you use sin(π/2 – x) to compute a cosine.

The only inverse trigonometric function is a(x) for arctangent. This function can be bootsrapped to compute other inverse functions via these identities:

arcsin(x) = arctan(x / sqrt(1 – x2))
arccos(x) = arctan(sqrt(1 – x2 )/ x)
arccot(x) = π/2 – arctan(x)
arcsec(x) = arctan(sqrt(x2 – 1))
arccsc(x) = arctan(1/sqrt(x2 – 1))

The functions l(x) and e(x) compute (natural) logarithm and exponential respectively. bc has a power operator ^ but it can only be used for integer powers. So you could compute the fourth power of x with x^4 but you cannot compute the fourth root of x with x^0.25. To compute xy for a floating point value y, use e(l(x)*y). Also, you can use the identity logb(x) = log(x) / log(b) to find logarithms to other bases. For example, you could compute the log base 2 of x using l(x)/l(2).

Not only is bc surprising for the functions it does not contain, such as no tangent function, it is also surprising for what it does contain. The third surprise is that in addition to its five elementary functions, the bc math library has a function j(n,x) t0 compute the nth Bessel function of x where n is an integer. (You can pass in a floating point value of n but bc will lop off the fractional part.)

I don’t know the history of bc, but it seems someone must have needed Bessel functions and convinced the author to add them. Without j, the library consists entirely of elementary functions of one argument and the names of the functions spell out “scale.” The function j breaks this pattern.

If I could include one advanced function in a calculator, it would be the gamma function, not Bessel functions. (Actually, the logarithm of the gamma function is more useful than the gamma function itself, as I explain here.) Bessel functions are important in applications but I would expect more demand for the gamma function.

Update: Right after I posted this, I got an email saying bc -l was following me on Twitter. Since when do Unix commands have Twitter accounts? Well,  Hilary Mason has created a Twitter account bc_l that will run bc -l on anything you send it via DM.

Related posts:

How many trig functions are there?

{ 6 comments }

Volumes of Lp unit balls

by John on July 2, 2010

The unit ball in n dimensions under the Lp norm has volume

2^n \frac{\Gamma\left(1 + \frac{1}{p}\right)^n}{\Gamma\left( 1 + \frac{n}{p} \right)}

I ran across this formula via A nice formula for the volume of an L_p ball. That post gives an even more general result that allows different values of p along each axis.

There have been several blog posts lately on the volume of balls in higher dimensions that correspond to the case p = 2. The formula above is valid for all p > 0.

Note that as p goes to ∞ the volume goes to 2n because the terms involving gamma functions go to 1. This is as we’d expect since the unit “ball” in the infinity norm is a cube, two units wide on each side.

Related post:

Means and inequalities

{ 4 comments }

Cosines and correlation

by John on June 17, 2010

Preview

This post will explain a connection between probability and geometry. Standard deviations for independent random variables add according to the Pythagorean theorem. Standard deviations for correlated random variables add like the law of cosines. This is because correlation is a cosine. Update: Here is a Spanish translation of this post.

Independent variables

First, let’s start with two independent random variables X and Y. Then the standard deviations of X and Y add like sides of a right triangle.

diagram

In the diagram above, “sd” stands for standard deviation, the square root of variance. The diagram is correct because the formula

Var(X+Y) = Var(X) + Var(Y)

is analogous to the Pythagorean theorem

c2 = a2 + b2.

Dependent variables

Next we drop the assumption of independence. If X and Y are correlated, the variance formula is analogous to the law of cosines.

diagram

The generalization of the previous variance formula to allow for dependent variables is

Var(X+Y) = Var(X) + Var(Y) + 2 Cov(X, Y).

Here Cov(X,Y) is the covariance of X and Y. The analogous law of cosines is

c2 = a2 + b2 – 2 a b cos(θ).

If we let a, b, and c be the standard deviations of X, Y, and X+Y respectively, then cos(θ) = -ρ where ρ is the correlation between X and Y defined by

ρ(X, Y) = Cov(X, Y) / sd(X) sd(Y).

When θ is π/2 (i.e. 90°) the random variables are independent. When θ is larger, the variables are positively correlated. When θ is smaller, the variables are negatively correlated. Said another way, as θ increases from 0 to π (i.e. 180°), the correlation increases from -1 to 1.

The analogy above is a little awkward, however, because of the minus sign. Let’s rephrase it in terms of the supplementary angle φ = π – θ. Slide the line representing the standard deviation of Y over to the left end of the horizontal line representing the standard deviation of X.

diagram

Now cos(φ) = ρ = correlation(X, Y).

When φ is small, the two line segments are pointing in nearly the same direction and the random variables are highly positively correlated. If φ is large, near π, the two line segments are pointing in nearly opposite directions and the random variables are highly negatively correlated.

Connection explained

Now let’s see the source of the connection between correlation and the law of cosines. Suppose X and Y have mean 0. Think of X and Y as members of an inner product space where the inner product <X, Y> is E(XY). Then

<X+Y, X+Y> = < X, X> + < Y, Y> + 2<X, Y >.

In an inner product space,

<X, Y > = || X || || Y || cos φ

where the norm || X || of a vector is the square root of the vector’s inner product with itself. The above equation defines the angle φ between two vectors. You could justify this definition by seeing that it agrees with ordinary plane geometry in the plane containing the three vectors X, Y, and X+Y.

Related links

Math and statistics links

{ 8 comments }

Why computers have two zeros: +0 and -0

by John on June 15, 2010

Here’s a strange detail of floating point arithmetic: computers have two versions of 0: positive zero and negative zero. Most of the time the distinction between +0 and -0 doesn’t matter, once in a while signed versions of zero come in handy.

If a positive quantity underflows to zero, it becomes +0. And if a negative quantity underflows to zero, it becomes -0. You could think of +0 (respectively, -0) as the bit pattern for a positive (negative) number too small to represent.

The IEEE floating point standard says 1/+0 should be +infinity and 1/-0 should be -infinity. This makes sense if you interpret +/- 0 as the ghost of a number that underflowed leaving behind only its sign. The reciprocal of a positive (negative) number too small to represent is a positive (negative) number too large to represent.

To demonstrate this, run the following C code.

int main()
{
    double x = 1e-200;
    double y = 1e-200 * x;
    printf("Reciprocal of +0: %g\n", 1/y);
    y = -1e-200*x;
    printf("Reciprocal of -0: %g\n", 1/y);
}

On Linux with gcc the output is

Reciprocal of +0: inf
Reciprocal of -0: -inf

Windows with Visual C++ returns the same output except Windows prints infinity as 1#INF rather than inf.

There is something, however, about signed zeros and exceptions that doesn’t make sense to me. The aptly named report “What Every Computer Scientist Should Know About Floating Point Arithmetic” has the following to say about signed zeros.

In IEEE arithmetic, it is natural to define log 0 = -∞ and log x to be a NaN when x < 0. Suppose that x represents a small negative number that has underflowed to zero. Thanks to signed zero, x will be negative, so log can return a NaN. However, if there were no signed zero, the log function could not distinguish an underflowed negative number from 0, and would therefore have to return -∞.

This implies log(-0) should be NaN and log(+0) should be -∞. That makes sense, but that’s not what happens in practice. The log function returns -∞ for both +0 and -0.

I ran the following code on Linux and Windows.

int main()
{
    double x = 1e-200;
    double y = 1e-200 * x;
    printf("Log of +0: %g\n", log(y));
    y = -1e-200*x;
    printf("Log of -0: %g\n", log(y));
}

On Linux, the code prints

Log of +0: -inf
Log of -0: -inf

The results were the same on Windows, except for the way Windows displays infinities.

Related link:

IEEE floating point exceptions in C++
Anatomy of a floating point number
Math library functions that seem unnecessary

{ 10 comments }

Generating Poisson random values

by John on June 14, 2010

The most direct way of generating random samples from a Poisson distribution is efficient for some parameters and inefficient for others. Wikipedia attributes the following algorithm to Donald Knuth:

    init:
         Let L ← exp(−λ), k ← 0 and p ← 1.
    do:
         k ← k + 1.
         Generate uniform random number u in [0,1] and let p ← p × u.
    while p > L.
    return k − 1.

Here’s the idea behind the algorithm. The time between arrivals in a Poisson process is exponentially distributed. Count how many arrivals there are in an interval by simulating the times between arrivals and adding them up until the time sum spills over the interval.

The problem with this approach is that the expected run time of the loop is proportional to the parameter λ. This algorithm is commonly used despite its terrible performance for large arguments.

Below is an algorithm that has expected run time independent of the argument λ. The algorithm is fairly simple, though it takes a moderate amount of theoretical justification to explain. It goes back to 1979 and so may not the most efficient algorithm available. It is definitely not the most efficient algorithm if you need to generate a large number of samples using the same parameter λ. The paper describing the algorithm recommends using Knuth’s algorithm for values of λ less than 30 and this algorithm for larger values of λ.

c = 0.767 - 3.36/lambda
beta = PI/sqrt(3.0*lambda)
alpha = beta*lambda
k = log(c) - lambda - log(beta)

forever
{
	u = random()
	x = (alpha - log((1.0 - u)/u))/beta
	n = floor(x + 0.5)
	if (n < 0)
		continue
	v = random()
	y = alpha - beta*x
	lhs = y + log(v/(1.0 + exp(y))^2)
	rhs = k + n*log(lambda) - log(n!)
	if (lhs <= rhs)
		return n
}

This is an accept-reject method based on a normal approximation. The performance improves as λ increases because the quality of the normal approximation improves. (Actually, it uses a logistic approximation to a normal approximation!) Theoretically it could be caught in an infinite loop, as with all accept-reject methods. However, the expected number of iterations is bounded. The continue statement means that if n is negative, the algorithm goes back to the top of the loop. The random() function is assumed to return a uniform random variable between 0 and 1.

A possible obstacle to turning the pseudocode above into actual code is computing log(n!). Naively computing n! and taking the log of the result will overflow for moderately large values of n. If you have a function available that computes the log of the gamma function, such as lgamma in C’s math.h, then evaluate that function at n+1. Otherwise, see How to compute log factorial.

The algorithm above is “method PA” from “The Computer Generation of Poisson Random Variables” by A. C. Atkinson, Journal of the Royal Statistical Society Series C (Applied Statistics) Vol. 28, No. 1. (1979), pages 29-35. Method PA is on page 32.

Related posts:

C# random number generation code
How to test a random number generator

{ 3 comments }

C# math gotchas

by John on June 8, 2010

C# has three mathematical constants that look like constants in the C header file float.h. Two of these are not what you might expect.

The constant double.MaxValue in C# looks like the constant DBL_MAX in C, and indeed it is. Both give the maximum finite value of a double, which is on the order of 10^308. This might lead you to believe that double.MinValue in C# is the same as DBL_MIN in C or that double.Epsilon in C# is the same as DBL_EPSILON. If so, you’re in for a surprise.

The constants DBL_MAX and double.MaxValue are the same because there is no ambiguity over what “max” means: the largest finite value of a double. But DBL_MIN and double.MinValue are different because they minimize over different ranges. The constant DBL_MIN is the smallest positive value of a normalized double. The constant double.MinValue in C# is the smallest (i.e. most negative) value of a double and is the negative of double.MaxValue. The difference between DBL_MIN and double.MinValue is approximately the difference between 10^-308 and -10^308, between a very small positive number and a very large negative number.

C has a constant DBL_EPSILON for the smallest positive double precision number x such that 1 + x does not equal 1 in machine precision. Typically a double has about 15 figures of precision, and so DBL_EPSILON is on the order of 10^-16. (For a more precise description, see Anatomy of a floating point number.)

You might expect double.Epsilon in C# corresponds to DBL_EPSILON in C. I did, until a unit test failed on some numerical code I was porting from C++ to C#. But in C# double.Epsilon is the smallest positive value a (denormalized) double can take. It is similar to DBL_MIN, except that double.Epsilon is the possible smallest value of a double, not requiring normalization. The constant DBL_MIN is on the order of 10^-308 while double.Epsilon is on the order of 10^-324 because it allows denormalized values. (See Anatomy of a floating point number for details of denormalized numbers.)

Incidentally, the C constants DBL_MAX, DBL_MIN, and DBL_EPSILON equal the return values of max, min, and epsilon for the C++ class numeric_limits<double>.

To summarize,

  • double.MaxValue in C# equals DBL_MAX in C.
  • double.MinValue in C# equals -DBL_MAX in C.
  • double.Epsilon is similar to DBL_MIN in C, but orders of magnitude smaller.
  • C# has no analog of DBL_EPSILON from C.

One could argue that the C# names are better than the C names. It makes sense for double.MinValue to be the negative of double.MaxValue. But the use of Epsilon was a mistake. The term “epsilon” in numeric computing has long been established and is not limited to C. It would have been better if Microsoft had used the name MinPositiveValue to be more explicit and not conflict with established terminology.

Related posts:

C# random number generation
Math library functions that seem unnecessary
Floating point numbers are a leaky abstraction

{ 10 comments }

This post will give several examples of functions include in the standard C math library that seem unnecessary at first glance.

Function log1p(x) = log(1 + x)

The function log1p computes log(1 + x).  How hard could this be to implement?

log(1 + x);

Done.

But wait. What if x is very small? If x is 10-16, for example, then on a typical system 1 + x = 1 because machine precision does not contain enough significant bits to distinguish 1 + x from 1. (For details, see Anatomy of a floating point number.) That means that the code log(1 + x) would first compute 1 + x, obtain 1, then compute log(1), and return 0. But log(1 + 10-16) ≈ 10-16. This means the absolute error is about 10-16 and the relative error is 100%. For values of x larger than 10-16 but still fairly small, the code log(1 + x) may not be completely inaccurate, but the relative error may still be unacceptably large.

Fortunately, this is an easy problem to fix. For small x, log(1 + x) is approximately x. So for very small arguments, just return x. For larger arguments, compute log(1 + x) directly. See details here.

Why does this matter? The absolute error is small, even if the code returns a zero for a non-zero answer. Isn’t that OK? Well, it might be. It depends on what you do next. If you add the result to a large number, then the relative error in the answer doesn’t matter. But if you multiply the result by a large number, your large relative error becomes a large absolute error as well.

Function expm1(x) = exp(x) – 1

Another function that may seem unnecessary is expm1. This function computes ex – 1. Why not just write this?

exp(x) - 1.0;

That’s fine for moderately large x. For very small values of x, exp(x) is close to 1, maybe so close to 1 that it actually equals 1 to machine precision. In that case, the code exp(x) - 1 will return 0 and result in 100% relative error. As before, for slightly larger values of x the naive code will not be entirely inaccurate, but it may be less accurate than needed. The solution is to compute exp(x) – 1 directly without first computing exp(x). The Taylor series for exp(x) is 1 + x + x2/2 … So for very small x, we could just return x for exp(x) – 1. Or for slightly larger x, we could return x + x2/2. (See details here.)

Functions erf(x) and erfc(x)

The C math library contains a pair of functions erf and erfc. The “c” stands for “complement” because erfc(x) = 1 – erf(x). The function erf(x) is known as the error function and is not trivial to implement. But why have a separate routine for erfc? Isn’t it trivial to implement once you have code for erf? For some values of x, yes, this is true. But if x is large, erf(x) is near 1, and the code 1 - erf(x) may return 0 when the result should be small but positive. As in the examples above, the naive implementation results in a complete loss of precision for some values and a partial loss of precision for other values.

Functions Γ(x) and log Γ(x)

The standard C math library has two functions related to the gamma function: tgamma that returns Γ(x) and lgamma that return log Γ(x). Why have both? Why can’t the latter just use the log of the former? The gamma function grows extremely quickly. For moderately large arguments, its value exceeds the capacity of a computer number. Sometimes you need these astronomically large numbers as intermediate results. Maybe you need a moderate-sized number that is the ratio of two very large numbers. (See an example here.) In such cases, you need to subtract lgamma values rather than take the ratio of tgamma values. (Why is the function called “tgamma” and not just “gamma”? See the last paragraph and first comment on this post.)

Conclusion

The standard C math library distills a lot of experience. Some of the functions may seem unnecessary, and so they are for some arguments. But for other arguments these functions are indispensable.

Related posts:

What’s so hard about finding a hypotenuse?
Floating point numbers are a leaky abstraction
Comparing math.h on POSIX and Windows systems

{ 17 comments }

Numerical libraries usually have a function for finding the hypotenuse of a right triangle. Isn’t this trivial? If the sides of a triangle are x and y, just write this:

sqrt(x*x + y*y)

That works in theory, but in practice it may fail. If x is so large that x*x overflows, the code will produce an infinite result. 

We’d like to be able to say to the computer “Now x*x and y*y might be too big, but just wait a minute. I’m going to take a square root, and so the numbers will get smaller. I just need to borrow some extra range for a minute.” You can do something like that in environments that have extended precision, but that would be inefficient and unnecessary. And of course it would fail completely if you’re already using the largest numeric type available.

Here’s how to compute sqrt(x*x + y*y) without risking overflow.

  1. max = maximum(|x|, |y|)
  2. min = minimum(|x|, |y|)
  3. r = min / max
  4. return max*sqrt(1 + r*r)

Notice that the square root argument is between 1 and 2 and so there is no danger of overflow in taking the square root. The only way the algorithm could overflow is if the result itself is too large to represent. In that case the fault lies with the problem, not the algorithm: the algorithm was asked to compute a quantity larger than the largest representable number, and so it does the correct thing by overflowing and returning an infinite value. However, the algorithm will not unnecessarily return an infinite value due to an overflow in an intermediate computation.

To see how the algorithm above may succeed when the naive implementation would fail, let M be the largest representable number. For IEEE double precision, M is on the order of 10^308. All that matters for this example is that M is greater than 4. Let x and y equal 0.5 M. The algorithm above would return 0.707 M. But the naive algorithm would compute x*x and overflow since 0.25 M2 > M. Then x*x + y*y would evaluate to infinity and the square root would evaluate to infinity.

A hypotenuse function is included in math.h on every system that I am aware of. It may be named hypot or _hypot.

Related posts:

Floating point numbers are a leaky abstraction
IEEE floating point exceptions in C++
Anatomy of a floating point number

{ 18 comments }

The logistic distribution looks very much like a normal distribution. Here’s a plot of the density for a logistic distribution.

This suggests we could approximate a logistic distribution by a normal distribution. (Or the other way around: sometimes it would be handy to approximate a normal distribution by a logistic. Always invert.)

But which normal distribution approximates a logistic distribution? That is, how should we pick the variance of the normal distribution?

The logistic distribution is most easily described by its distribution function (CDF):

F(x) = exp(x) / (1 + exp(x)).

To find the density (PDF), just differentiate the expression above. You can add a location and scale parameter, but we’ll keep it simple and assume the location (mean) is 0 and the scale is 1.  Since the logistic distribution is symmetric about 0, we set the mean of the normal to 0 so it is also symmetric about 0. But how should we pick the scale (standard deviation) σ for the normal?

The most natural thing to do, or at least the easiest thing to do, is to match moments: pick the variance of the normal so that both distributions have the same variance. This says we should use a normal with variance σ2 = π2/3 or σ = π/√3 . How well does that work? The following graph gives the answer. The logistic density is given by the solid blue line and the normal density is given by the dashed orange line.

Not bad, but we could do better. We could search for the value of σ that minimizes the difference between the two densities. The minimum occurs around σ = 1.6. Here is the revised graph using that value.

The maximum difference is almost three times smaller when we use σ = 1.6 rather than σ = π/√ 3 ≈ 1.8.

What if we want to minimize the difference between the distribution (CDF) functions rather than the density (PDF) functions? It turns out we end up at about the same spot: set σ to approximately 1.6. The two optimization problems don’t have exactly the same solution, but the two solutions are close.

The maximum difference between the distribution function of a logistic and the distribution of a normal with σ = 1.6 is about 0.017. If we used moment matching and set σ = π/√3, the maximum difference would be about 0.022. So moment matching does a better job of approximating the CDFs than approximating the PDFs. But we don’t need to decide between the two criteria since setting σ = 1.6 approximately minimizes both measures of the approximation.

Related posts:

Cosine approximation to the normal density
Always invert

{ 7 comments }

Root-finding software in C#

by John on May 6, 2010

Code Project just posted my article Three Methods for Root-finding in C#. The article discusses the pros and cons of bisection, Newton’s method, and Brent’s method and gives C# implementations of each.

Related posts:

C# random number generation code
Free C# book
Floating point numbers are a leaky abstraction

{ 0 comments }