Suppose you’re doing something that has probability of success *p* and probability of failure *q* = 1-*p*. If you repeat what you’re doing *m*+*n* times, the probability of *m* successes and *n* failures is given by

Now suppose *m* and *n* are moderately large. The terms (*m*+*n*)! and *m*! *n*! will be huge, but the terms *p ^{m}* and

*q*will be tiny. The huge terms might overflow, and the tiny terms might underflow, even though the final result may be a moderate-sized number. The numbers m and n don’t have to be that large for this to happen since factorials grow very quickly. On a typical system, overflow happens if

^{n}*m*+

*n*> 170. How do you get to the end result without having to deal with impossibly large or small values along the way?

The trick is to work with logs. You want to first calculate log( (*m*+*n*)! ) – log( *m*! ) – log( *n*! ) + *m* log( *p* ) + *n* log( *q* ) , then exponentiate the result. This pattern comes up constantly in statistical computing.

Libraries don’t always include functions for evaluating factorial or log factorial. They’re more likely to include functions for Γ(*x*) and its log. For positive integers *n*, Γ(*n*+1) = *n*!. Now suppose you have a function `lgamma`

that computes log Γ(*x*). You might write something like this.

double probability(double p, double q, int m, int n) { double temp = lgamma(m + n + 1.0); temp -= lgamma(n + 1.0) + lgamma(m + 1.0); temp += m*log(p) + n*log(q); return exp(temp); }

The function `lgamma`

is not part of the ANSI standard library for C or C++, but it is part of the POSIX standard. On Unix-like systems, `lgamma`

is included in the standard library. However, Microsoft does not include `lgamma`

as part of the Visual C++ standard library. On Windows you have to either implement your own `lgamma`

function or grab an implementation from somewhere like Boost.

Here’s something to watch out for with POSIX math libraries. I believe the POSIX standard does not require a function called `gamma`

for evaluating the gamma function Γ(x). Instead, the standard requires functions `lgamma`

for the log of the gamma function and `tgamma`

for the gamma function itself. (The mnemonic is “t” for “true,” meaning that you really want the gamma function.) I wrote a little code that called `gamma`

and tested it on OS X and Red Hat Linux this evening. In both cases `gcc`

compiled the code without warning, even with the `-Wall`

and `-pedantic`

warning flags. But on the Mac, `gamma`

was interpreted as `tgamma`

and on the Linux box it was interpreted as `lgamma`

. This means that gamma(10.0) would equal 362880 on the former and 12.8018 on the latter.

If you don’t have access to an implementation of log gamma, see How to compute log factorial.