Average sum of digits

The smaller the base you write numbers in, the smaller their digits sums will be on average.

This need not be true of any given number, only for numbers on average. For example, let n = 2021. In base 10, the sum of the digits is 5, but in base 2 the sum of the digits is 8 because n is written as 11111100101 in binary. So for our particular example, a smaller base lead to a larger digit sum. In fact, all smaller bases than 10 lead to a larger digit sum for 2021. But on average, numbers have smaller digit sums when written in binary than when written in base 10.

Let’s explore this with a little Python code, then look at a theoretical result.

First, we need some code to express numbers in different bases, and here’s a simple function for that.

    # From https://stackoverflow.com/a/28666223
    def numberToBase(n, b):
        if n == 0:
            return [0]
        digits = []
        while n:
            digits.append(int(n % b))
            n //= b
        return digits[::-1]

Using the code above, our digit sum function is trivial:

    def digitSum(n, b):
        return sum(numberToBase(n, b))

Now for our simulation.

    import numpy as np
    np.random.seed(20200103)

    upper_bound = 1000000000
    sample_size = 1000
    bases = np.arange(2, 17)

    sample = np.random.randint(0, upper_bound, size = sample_size)
    out = [sum(digitSum(n, b) for n in sample)/sample_size for b in bases]

A theorem in [1] says that

\frac{S(b, N)}{N} \sim \frac{(r-1)\log N}{2\log b}

where S(b, N) is the sum of the number of digits of all numbers from 1 to N when written in base b. This value is asymptotically equal to the expression on the right.

We can compute the average number of digits we’d expect in our simulation using the asymptotic result as an approximation.

    asymp = [0.5*(b-1)*np.log(upper_bound) / np.log(b) for b in bases]

The table below compares the simulation results to theory.

    |------+------------+------------|
    | base | simulation | asymptotic |
    |------+------------+------------|
    |    2 |     14.899 |     14.949 |
    |    3 |     18.660 |     18.863 |
    |    4 |     22.317 |     22.423 |
    |    5 |     25.398 |     25.752 |
    |    6 |     28.345 |     28.915 |
    |    7 |     30.996 |     31.949 |
    |    8 |     34.839 |     34.880 |
    |    9 |     36.422 |     37.726 |
    |   10 |     40.275 |     40.500 |
    |   11 |     41.480 |     43.211 |
    |   12 |     44.353 |     45.868 |
    |   13 |     47.454 |     48.476 |
    |   14 |     49.949 |     51.041 |
    |   15 |     51.772 |     53.567 |
    |   16 |     53.610 |     56.058 |
    |------+------------+------------|

[1] L. E. Bush. An Asymptotic Formula for the Average Sum of the Digits of Integers. The American Mathematical Monthly , Mar., 1940, Vol. 47, pp. 154-156.