This post will give several examples of functions include in the standard C math library that seem unnecessary at first glance.
Function log1p(x) = log(1 + x)
The function log1p
computes log(1 + x). How hard could this be to implement?
log(1 + x);
Done.
But wait. What if x is very small? If x is 10−16, for example, then on a typical system 1 + x = 1 because machine precision does not contain enough significant bits to distinguish 1 + x from 1. (For details, see Anatomy of a floating point number.) That means that the code log(1 + x)
would first compute 1 + x, obtain 1, then compute log(1), and return 0. But log(1 + 10−16) ≈ 10−16. This means the absolute error is about 10−16 and the relative error is 100%. For values of x larger than 10−16 but still fairly small, the code log(1 + x)
may not be completely inaccurate, but the relative error may still be unacceptably large.
Fortunately, this is an easy problem to fix. For small x, log(1 + x) is approximately x. So for very small arguments, just return x. For larger arguments, compute log(1 + x) directly. See details here.
Why does this matter? The absolute error is small, even if the code returns a zero for a non-zero answer. Isn’t that OK? Well, it might be. It depends on what you do next. If you add the result to a large number, then the relative error in the answer doesn’t matter. But if you multiply the result by a large number, your large relative error becomes a large absolute error as well.
Function expm1(x) = exp(x) − 1
Another function that may seem unnecessary is expm1
. This function computes ex – 1. Why not just write this?
exp(x) - 1.0;
That’s fine for moderately large x. For very small values of x, exp(x) is close to 1, maybe so close to 1 that it actually equals 1 to machine precision. In that case, the code exp(x) - 1
will return 0 and result in 100% relative error. As before, for slightly larger values of x the naive code will not be entirely inaccurate, but it may be less accurate than needed. The solution is to compute exp(x) − 1 directly without first computing exp(x). The Taylor series for exp(x) is 1 + x + x2/2 … So for very small x, we could just return x for exp(x) − 1. Or for slightly larger x, we could return x + x2/2. (See details here.)
Functions erf(x) and erfc(x)
The C math library contains a pair of functions erf
and erfc
. The “c” stands for “complement” because erfc(x) = 1 − erf(x). The function erf(x) is known as the error function and is not trivial to implement. But why have a separate routine for erfc
? Isn’t it trivial to implement once you have code for erf
? For some values of x, yes, this is true. But if x is large, erf(x) is near 1, and the code 1 - erf(x)
may return 0 when the result should be small but positive. As in the examples above, the naive implementation results in a complete loss of precision for some values and a partial loss of precision for other values.
Functions Γ(x) and log Γ(x)
The standard C math library has two functions related to the gamma function: tgamma
that returns Γ(x) and lgamma
that return log Γ(x). Why have both? Why can’t the latter just use the log of the former? The gamma function grows extremely quickly. For moderately large arguments, its value exceeds the capacity of a computer number. Sometimes you need these astronomically large numbers as intermediate results. Maybe you need a moderate-sized number that is the ratio of two very large numbers. (See an example here.) In such cases, you need to subtract lgamma
values rather than take the ratio of tgamma
values. (Why is the function called “tgamma” and not just “gamma”? See the last paragraph and first comment on this post.)
Conclusion
The standard C math library distills a lot of experience. Some of the functions may seem unnecessary, and so they are for some arguments. But for other arguments these functions are indispensable.
> Function log1p(x) = log(1 + p)
typo, should be: … = log(1 + x)
Thanks for catching the typo. I updated the post.
What’s wrong with having helper wrapper functions provided by a core language library? The library authors must have identified a need of replacing those small helper functions which they ended up writing time and time again, no? :-)
I would guess that you should think of log(x) = log1p(x-1) instead. This is because log(x) is not defined at zero, so its Taylor series is commonly centered at 1. (See http://en.wikipedia.org/wiki/Taylor_series#Examples ).
you need log gamma because factorials grow very quickly, so quickly that once you reach a certain point you’re out of range for doubles, at those magnitudes you might as well just be tracking the magnitude anyways.
I suspect lgamma is implemented a little differently than log(tgamma(x)), as tgamma doesn’t have enough precision.
right, it is obvious that due to floating point arithmetic different algorithms are better than other in different regions of the domain. What always bothered me was how to patch the algorithms?, what to do with the “jump” at these regions?. Maybe one controlled “jump” is just better that many unavoidable jumps (due to ‘random’ relative error) if a single algorithm is uses.
The problem is even worst when patching functions in the complex domain, where the jump may be defined in a circle.
@alfC: As for patching, when in doubt call the more sophisticated functions. For example, you can call log1p(x) for any x. It will produce an accurate answer no matter the value of x. If x is large, it will call log(1+x). Here you pay a very slight performance penalty compared to calling log(1 + x) yourself, but you’re protected against a loss of precision in case x is small.
@John: as usual with FP transcendental functions I am missing something. Common sense tells me that there “must” be a trade off. Otherwise to compute log(x), it would be always better to compute log(x-1+1) i.e logp1(x-1) and I don’t think that is the case. so the best next optimal strategy will be patching both functions. (After patching there will be a jump, then we have to smooth the jump which is another problem per se, because the derivatives get crazy, and more sophisticated smoothing is necesary, etc, according to this logic it is very hard to reach the ‘perfect’ numerical log function for any x, and it not clear how).
alfC: What you’re missing is some subtle implications of the fact that floating-point numbers are discrete, not continuous. It’s possible for both implementations to provide exact answers — where, by exact, we mean rounded to the nearest representable value. In fact, not only is it possible, it’s relatively common; I think IEEE gives you something like a single bit of wiggle room, but one bit is not enough to worry about. Thus, for the same input, both functions will give essentially the same output, and both of them will be exact for any relevant purpose. Thus, the subtleties:
* There may be differences in performance, because one has more patches than the other. If you add additional patching, you’re making the situation worse than either option.
* That “for the same input” caveat is really the key here. Because floating-point numbers are discrete and unevenly spaced, you CAN’T give them the same input. In particular, if X is 1 + 1e-57, you can pass 1e-57 to log1p, but you can’t pass 1 + 1e57 to log — the closest available floating-point number you can pass to log is just 1. The accuracy distinctions are entirely about the input that you give to the functions, and have nothing to do with the internal function implementation itself.
Thus, you want to use log1p() for numbers very close to 1 that can’t be accurately represented directly — but if, and only if, your algorithm is already designed to compute and store them as offsets from 1 rather than storing them directly (i.e., as offsets from 0). If you are storing those inputs directly, you gain nothing from writing things as log1p(x – 1), because your algorithm has already lost the any extra precision of those inputs it might have had in computing and storing them that way. If x is already rounded to 1 rather than a conceptual representation of 1 + 1e-57 (where the “1 +” is implied and only the 1e-57 is stored in the variable in memory), then subtracting 1 from it gets you 0, not 1e-57, and using log1p is no help at all.
I believe you meant to say “… because machine precision does not contain enough significant bits to distinguish 1 + x from 1” instead of 1+x from x.
Thanks. I updated the post with the correction.
This is the first computer-related thing I’ve read that has made me feel “officially old.” I’m 38, and the whole idea of precision with logarithms seems oxymoronic. I learned log tables right at the time when they were just becoming irrelevant, and embedded in the process was the understanding that they were a way of calculating quickly by forgoing a certain level of precision.
For it to matter if the log of 1.000000000wheeeeeee16 is correct or an error makes me feel like an innumerate old man.
Fairly interesting. I didn’t know about these functions (and it is a pity, I guess I could use log1p and exp1p quite often!), and indeed, the C math library shows a lot of detailed thinking every time you consider it. But I still wonder why pi has to be M_PI and not just PI ;)
Ruben
Hello, John.
This is a manual trackback ;)
I am one of the translators of Hacker Monthly Chinese Translation Group (Translation by the project team all articles (documents) follow Creative Commons 3.0 BY-NC-SA agreement). As this post is published in Hacker Monthly Issue 3 , I have translated it into Chinese today. It’s really a good article. And thanks for giving us such many articles :) If you do not want this be published , let us know , we will remove it at once.
Nice article. I had not used the C library’s tgamma, lgamma, erf and erfc. Instead I had coded up algorithms myself several years ago.
I have a subtle question: I need to evaluate the function log(1 – exp(-x)) for *potentially* small x, what is better to use log(-expm1(-x)) or logp1(-exp(-x)) ? (here log and exp are the usual C library implementations.)
The functions log1p and expm1 are necessary for small arguments, but they work well for large arguments. So don’t hesitate to use them for arguments that might be large. I think both solutions you suggest would work OK.
Thanks John, there is even a third option: log1p(-expm1(-x)-1).
My favorite useless example is the random generation from the inverse gamma distribution in r.
function (n, shape, scale = 1)
{
return(1/rgamma(n = n, shape = shape, rate = scale))
}
It is actually unnecessary; caused by a deficiency of our programming languages. Not an easy problem to solve; we need more high(higher?) level languages.
One shouldn’t have to think about optimizing when writing their programs. Ideally, the compiler should do it for the user.
One way could be using symbolic manipulation. Or dependent types. Or a mixture?
There will always be trade-offs: efficiency, simplicity, transparency, user sophistication, etc. The implicit trade-off in floating point computing is to value efficiency and transparency at the expense of requiring sophisticated users. There are different trade-offs that make good sense in different contexts. In some contexts symbolic computation places fewer demands on users and more demands on computers. Or at least it places different demands on the user. It doesn’t require the user to understand as much about floating point, but it requires them to learn the pitfalls of symbolic computation!
Historical note: When Henry Briggs saw Napier’s construction of the logarithms, he wrote to Napier to say he had a couple of ideas how to improve it. Napier, too, had had some ideas during the twenty years he worked on his tables. They collaborated the last couple of years of Napier’s life (the summers of 1615/16). Briggs’ used log(1+p) ~ p as the basis for starting his construction of his tables in “Arithmetica Logarithmica” (1624/28). To get 16 digits, he started by repeatedly taking the square root of 10 fifty-four times out to 32 decimal places, which, eerily, is close to what you might do to get log(10) rounded correctly in current double-precision floating-point. The need for, at least usefulness of, log1p() goes back almost to the beginning of logarithms.
@alfC:
For small x, log(-expm1(-x)) is better than log1p(-exp(-x)).
If you go via exp(-x) you’re throwing away precision by computing something close to 1-x for small x.
If you go via expm1(-x), this is close to zero, and you have more precision around zero than around 1.
Mathematica has a LogGamma[z] function that is different from Log[Gamma[z]] in a crucial way. I don’t know if the C library has the same feature, but I’m guessing it does. For complex z, the logarithm is multivalued, and LogGamma chooses the values so that the resulting function is continuous as a *complex* function. It is the choice of logarithm, for example, that is bounded by Stirling’s Approximation. As a specific example, Log[Gamma[24+I]] is roughly 51.58540859 – 3.12580797 I, while LogGamma[24+I] is roughly 51.58540859 + 3.15737734 I. Same real parts, but different imaginary parts.