Number of digits in Catalan numbers

The Catalan numbers Cn are closely related to the central binomial coefficients I wrote about recently:

C+n = \frac{1}{n+1} {2n \choose n}

In this post I’ll write C(n) rather than Cn because that will be easier to read later on.

Catalan numbers come up in all kinds of applications. For example, the number of ways to parenthesize an expression with n terms is the nth Catalan number C(n).

Number of digits

Here’s a strange theorem regarding Catalan numbers that I found in Catalan Numbers with Applications by Thomas Koshy:

The number of digits in C(10n) … converges to the number formed by the digits on the right side of the decimal point in log10 4 = 0.60205999132…

Let’s see what that means. C(10) equals 16,796 which of course has 5 digits.

Next, C(100) equals

896,519,947,090,131,496,687,170,070,074,100,632,420,837,521,538,745,909,320

which has 57 digits. These numbers are getting really big really fast, so I’ll give a table of just the number of digits of a few more examples rather than listing the Catalan numbers themselves.

    |---+------------------|
    | n | # C(10^n) digits |
    |---+------------------|
    | 1 |                5 |
    | 2 |               57 |
    | 3 |              598 |
    | 4 |             6015 |
    | 5 |            60199 |
    | 6 |           602051 |
    | 7 |          6020590 |
    | 8 |         60205987 |
    |---+------------------|

I stopped at n = 8 because my computer locked up when I tried to compute C(109). I was computing these numbers with the following Mathematica code:

    Table[
        IntegerLength[
            CatalanNumber[10^n]
        ], 
        {n, 1, 8}
    ]

Computing CatalanNumber[10^9] was too much for Mathematica. So how might we extent the table above?

Numerical computing

We don’t need to know the Catalan numbers themselves, only how many digits they have. And we can compute the number of digits from an approximate value of their logarithm. Taking logarithms also helps us avoid overflow.

We’ll write Python below to determine the number of digits, in part to show that we don’t need any special capability of something like Mathematica.

We need three facts before we write the code:

  1. The number of decimal digits in a number x is 1 + ⌊log10x⌋ where ⌊y⌋ is the floor of y, the greatest integer not greater than y.
  2. n! = Γ(n + 1)
  3. log10(x) = log(x) / log(10)

Note that when I write “log” I always mean natural log. More on that here.

This code can compute the number of digits of C(10n) quickly for large n.

    from scipy import log, floor
    from scipy.special import gammaln
    
    def log_catalan(n):
        return gammaln(2*n+1) - 2*gammaln(n+1) - log(n+1)
            
    def catalan_digits(n):
        return 1 + floor( log_catalan(n)/log(10) )
    
    for n in range(1, 14):
        print( n, catalan_digits(10.**n) )

The code doesn’t run into trouble until n = 306, in which case gammaln overflows. (Note the dot after 10 in the last line. Without the dot Python computes 10**n as an integer, and that has problems when n = 19.)

Proof

How would you go about proving the theorem above? We want to show that the number of digits in C(n) divided by 10n converges to log10 4, i.e.

\lim_{n\to\infty} \frac{1 + \lfloor \log_{10}C(10^n)\rfloor}{10^n} = \log_{10} 4

We can switch to natural logs by multiplying both sides by log(10). Also, in the limit we don’t need the 1 or the floor in the numerator. So it suffices to prove

\lim_{n\to\infty} \frac{\log C(10^n)}{10^n} = \log 4

Now we see this has nothing to do with base 10. We only need to prove

\lim_{n\to\infty} \frac{\log C(n)}{n} = \log 4

and that is a simple exercise using Stirling’s approximation.

Other bases

Our proof showed that this theorem ultimately doesn’t have anything to do with base 10. If we did everything in another base, we’d get analogous results.

To give a taste of that, let’s work in base 7. If we look at the Catalan numbers C(7n) and count how many “digits” their base 7 representations have, we get the same pattern as the base 7 representation of log7 4.

Note that the base appears in four places:

  1. which Catalan numbers we’re looking at,
  2. which base we express the Catalan numbers in,
  3. which base we take the log of 4 in, and
  4. which base we express that result in.

If you forget one of these, as I did at first, you won’t get the right result!

Here’s a little Mathematica code to do an example with n = 8.

    BaseForm[
        1 + Floor[ 
            Log[7, CatalanNumber[7^8]]
            ], 
        7
    ]

This returns 466233417 and the code

    BaseForm[Log[7, 4.], 7]

returns 0.46623367.

Related posts