Creating a Traveling Salesman Tour of Texas with Mathematica

A Traveling Salesman tour visits a list of destinations using the shortest path. There’s an obvious way to find the shortest path connecting N points: try all N! paths and see which one is shortest. Unfortunately, that might take a while.

Texas has 254 counties, and so calculating a tour of Texas counties by brute force would examine 254! paths, over 10500 paths. In theory, large Traveling Salesman problems are unsolvable. In practice they can often be solved quickly. As is often the case, the key is to give yourself just a little slack and look for solutions that are close to optimal.

I’ve used the example of a Traveling Salesman tour of Texas before because it makes a nice visual. People asked me for the code that made the image, but I didn’t save the code and didn’t remember offhand how to re-create it. So here’s the code for future reference.

Incidentally, computing the tour itself took only a second or two. Creating the visualization took several seconds.

    texas = Entity["AdministrativeDivision", "Texas"]; 
    counties = texas["Subdivisions"];
    tour = FindShortestTour[texas["Subdivisions"]];
    GeoGraphics[{Thick, Red, GeoPath[counties[[tour[[2]]]]]}]

Here counties is a list of objects representing Texas counties, sorted by alphabetical order, from Anderson County to Zavala County.

The tour object is a pair of a distance and a list of integers. The distance, 6780.74 nautical miles, is the length of the tour. The integers are the indexes of the counties in the tour.

{6780.74 nmi, {1, 107, 234, …, 201, 37, 1}}

The tour starts with the first county, Anderson County. It’s got to start somewhere, and I expect it always starts with the first item in the list. Next it goes to the 107th county, Henderson County, and so on. Because FindShortestTour returns a closed tour, the tour ends where it started, in Anderson County.

Related posts: Traveling Salesman tours of Africa, Americas, Eurasia and Oceania.

Cayley graphs in Mathematica

The previous post mentioned the group S4, the group of all permutations of a set with four elements. This post will show a way to visualize this group.

The Mathematica command

    CayleyGraph[
        SymmetricGroup[4], 
        VertexLabels -> Placed["Name", Center],
        VertexSize -> 0.4]

generates the graph below.

Cayley graph of alternating group S4

This is an interesting image, but what does it mean?

The elements of S4 are represented by the circled numbers. The numbers correspond to the permutations of four elements, listed in lexicographical order. If you label the four elements a, b, c, and d then the permutations are listed in alphabetical order. Permutation 1 is [1, 2, 3, 4] to itself and Permutation 24 is its reverse [4, 3, 2, 1].

In the Mathematica application, mousing over a number shows which permutation it represents, though the static image above doesn’t have this feature.

The blue arrows represent the permutation that swaps the first two elements. So the blue arrow between node 1 and node 7 says that swapping the first two elements of Permutation 1 gives you Permutation 7, which is [2, 1, 3, 4]. The blue arrow going back from 7 to 1 says that the same swapping operation applied to Permutation 7 returns you to Permutation 1.

All the blue arrows come in pairs because swapping is its own inverse.

The green arrows represent a rotation. For example, the green arrow from 1 to 10 says that rotation turns [1, 2, 3, 4] into [2, 3, 4, 1]. The rotation operation is not its own inverse, so the arrows only go in one direction. But every green arrow is part of a diamond because applying the rotation operation four times sends you back where you started.

You can get from any permutation to any other permutation by repeatedly either swapping the first two elements or applying a rotation. In group theoretical terminology, these two permutations generate the group S4.

Related posts

Permutations and centralizers in Mathematica

I had a fleeting thought about how little I use group theory when I realize I used it just this week.

A couple days ago I needed to know which permutations of 4 elements commute with reversal. If r takes a sequence and reverses it, I need to find all permutations p such that pr = rp.

In group theory jargon, the group of all permutations of 4 elements is the symmetric group S4. The subgroup of elements that commute with r is the centralizer of r. So my task was to find the centralizer of r in S4. How do I pose this task to Mathematica?

Mathematica represents permutations as disjoint cycles. The permutation r is represented as

    Cycles[{{4, 1}, {2, 3}}]

because swapping the first and last elements, then swapping the middle two elements, reverses a list of four elements.

To find the centralizer of r I asked Mathematica

    GroupCentralizer[SymmetricGroup[4], Cycles[{{4, 1}, {2, 3}}]]

This returns

    PermutationGroup[{Cycles[{{1, 4}}], Cycles[{{2, 3}}], Cycles[{{1, 2}, {3, 4}}]}]

This does list the permutations that commute with r but rather the generators of the group of such permutations. If we ask for the elements of the group above with

    GroupElements[%]

this returns

    {Cycles[{}], 
     Cycles[{{2, 3}}], 
     Cycles[{{1, 2}, {3, 4}}], 
     Cycles[{{1, 2, 4, 3}}], 
     Cycles[{{1, 3, 4, 2}}], 
     Cycles[{{1, 3}, {2, 4}}], 
     Cycles[{{1, 4}}], 
     Cycles[{{1, 4}, {2, 3}}]}

I use basic group theory and other algebra all the time, but I don’t think of it that way. In this example, I had a question about permutations, and it only occurred to me later that I could phrase my question in the vocabulary of group theory. I use ideas from algebra more often than I use the vocabulary of algebra.

Related posts

Drag equation exponent variation

The motion of a falling body of mass m is given by

m \frac{dv}{dt} = mg - kv^r

where the term −kvr accounts for drag due to air resistance. One can derive r = 2 under simple physical assumptions, but if I remember correctly other values of r may be more realistic in certain circumstances. I don’t know much about the physics here; if you know about the use of other values of r, please let me know by leaving a comment.

Terminal velocity

When r = 1 or r = 2 the differential equation above can be solved in terms of elementary functions, but otherwise it cannot. Nevertheless one can show that for all values of r the object reaches a terminal velocity, and calculate that velocity without explicitly solving the differential equation. William Waterhouse demonstrated this in a one-page article [1]. He rewrites the equation to look at time as a function of velocity rather than velocity as a function of time

\frac{dt}{dv} = \frac{1}{g} \frac{1}{1 - (k/mg)v^r}

and concludes

t = \frac{1}{g} \int_0^v \frac{dv}{1 - (k/mg)v^r} + t_0

He notes that the integral diverges as v approaches

 \left(\frac{mg}{k}\right)^{1/r}

and so that is the terminal velocity, i.e. it takes an infinite amount of time to achieve this velocity. Waterhouse recommends this derivation as “a good example of deriving information about a problem without knowing an explicit solution.”

I would add that such an approach is the norm, not an exception. A closed-form solution to a differential equation in nice when you can get it, but usually not possible. And even when you can find a closed-form solution, you may be able to achieve your goal more directly by not using it.

Hypergeometric solution

I suspected the differential equation could be solved for general values of r using special functions, and that is the case. Mathematica was able to evaluate the integral for t as a function of v in terms of a hypergeometric function.

v \, _2F_1\left(1,\frac{1}{r};1+\frac{1}{r};\frac{g k v^r}{m}\right)

When I asked Mathematica to solve the differential equation directly, it said that the solution is the inverse function of the function above. Apparently Waterhouse and Mathematica agree that it is easier to think of t as a function of v rather than the original formulation.

The notation 2F1 indicates there are two upper parameters and one lower parameter. In our application, the upper parameters are 1 and 1/r, the lower parameter is 1 + 1/r, and the function is evaluated at gkvr/m. You can find a brief introduction to hypergeometric functions here. A hypergeometric function 2F1 has a singularity at 1, and so we could derive the terminal velocity from the explicit solution.

Mathematica implementation

Let c = gk/m. Then we can express velocity as a function of time in Mathematica by

    f[r_, c_, v_] := InverseFunction
    [
        #1 Hypergeometric2F1[1, 1/r, 1 + 1/r, c #1^r] &
    ][t]

and use this to make plots of the velocity for various values of c and r.

The following sets c = 2 and varies r over 1, 1.1, 1.2, … 2.

    Plot[Table[f[2, d/10, t], {d, 10, 20}], {t, 0, 4}, PlotRange -> All]

Here’s the output.

The terminal velocity decreases as r increases. The opposite is true for c < 1.

[1] William C. Waterhouse. A Fact about Falling Bodies. Mathematics Magazine, Vol. 44, No. 1 (Jan., 1971), pp. 33–34. The article straddles two pages, but takes up less than half of each page.

Double duals of polyhedra

The previous post mentioned the dual of a tetrahedron is another tetrahedron. The dual of a cube is an octahedron and the dual of an octahedron is a cube. And the dual of a dodecahedron is an icosahedron, and the dual of an icosahedron is a dodecahedron.

So if you take the dual of a regular solid twice, you get back another regular solid of the same type. But you do not get the same regular solid back. To see this, let’s look at how you form the dual of a polyhedron.

To find the dual of a polyhedron, you make a new polyhedron whose vertices are the centroids of the faces of the original polyhedron. The vertices of a regular polyhedron all lie on the surface of a sphere. A face of the polygon lies inside the sphere, so the centroids of the solid are all inside the sphere. Since the centroids form the vertices of a regular solid, they also all lie on a sphere, but a smaller sphere inside the first sphere.

Size of duals

If you take the double dual of a regular solid, the dual of the dual, then you get a smaller solid of the same type. How much smaller? We can get Mathematica to calculate this for us.

The function DualPolyhedron does what the name implies. Let’s apply this to the default tetrahedron. When we enter

    DualPolyhedron[ Tetrahedron[] ]

we get back

    Tetrahedron[{2 Pi/3}, Pi}, 1/3]

The default tetrahedron, returned by calling Tetrahedron[], has vertices

  • (1, 0, 0)
  • (0, 0, 1)
  • (1, 0, 1)
  • (1, 1, 1)

Each edge has length 1. Its dual has been rotated by 2π/3 about the z axis and by π about the y axis, and has edges of length 1/3. Taking the dual of a tetrahedron rotates the tetrahedron and shrinks it by a factor of 3 (in each dimension).

Size of double duals

If we take the double dual of a tetrahedron, we get a tetrahedron nine times smaller. Let’s see what happens to the rest of the regular solids.

    DualPolyhedron[DualPolyhedron[#]] & /@ 
        {Tetrahedron[], Octahedron[], Cube[],
         Dodecahedron[], Icosahedron[]}

This tells us again what we found above for the double dual of a tetrahedron. It also tells us that taking the double dual of an octahedron or cube results in an octahedron or cube 3 times smaller. And it tells us that taking the double dual of a dodecahedron or icosahedron reduces its size by a factor of

(5 + 2√ 5)/15 = 0.6314…

Taking the double dual shrinks a regular solid, and the amount of shrinkage goes down as the number of faces goes up.

Orientation of double duals

The process of taking the double dual may also rotate the solid. The second dual of the standard tetrahedron, for example, is not just 9 times smaller. It’s also been rotated. We can see this by calling PolyhedronCoordinates. The code

    PolyhedronCoordinates[ 
        DualPolyhedron[ DualPolyhedron[ Tetrahedron[] ] ] ]

shows that the new vertices are

  • (7/9, 2/9, 6/9)
  • (7/9, 2/9, 7/9)
  • (7/9, 3/9, 7/9)
  • (6/9, 2/9, 7/9)

So the second dual of the tetrahedron is not just 9 times smaller.

However, taking the double dual of the rest of the regular solids does not change the orientation. You could verify this for the cube and dodecahedron using Mathematica. It then follows that taking the double dual of octahedra and icosahedra does not change the orientation.

Sixth dual

If you take the dual of a tetrahedron 6 times you do return to the original orientation, but not if you take the dual fewer times. You can verify this by running the following:

    NestList[DualPolyhedron, Tetrahedron[], 6]

Related posts

Poisson distribution tail bounds

Yesterday Terence Tao published a blog post on bounds for the Poisson probability distribution. Specifically, he wrote about Bennett’s inequalities and a refinement that he developed or at least made explicit. Tao writes

This observation is not difficult and is implicitly in the literature … I was not able to find a clean version of this statement in the literature, so I am placing it here on my blog.

Bennett’s inequalities say that for a Poisson random variable X with mean λ,

P(X \geq \lambda(1+u)) \leq \exp(-\lambda h(u))

for u ≥ 0 and

P(X \leq \lambda(1+u)) \leq \exp(-\lambda h(u))

for -1 < u ≤ 0 where

h(u) = (1+u) \log(1+u) - u

I wanted to visualize how tight Bennett’s bounds are and got some interesting images due to the discrete nature of the Poisson distribution.

Here’s a plot of how tight the right-tail estimate is, the gap between the two sides of the first inequality above.

Here u ranges from 0 to 2 along the front edge of the graphic and λ ranges from 5 to 10.

And here’s a plot of how tight the left-tail estimate is, the gap between the two sides of the second inequality above.

Mathematica tweaking

The latter image was easy to make, but the first image required a couple adjustments: the image had holes in it, and the view point was awkward.

I fixed the holes in the plot by adding the option PlotPoints->50 to make the sampling finer. I fixed the view point by grabbing the image and rotating it until I thought it looked better. I could have saved the rotated image directly, but I was curious how to do this with the Export command. To do this I needed to specify the ViewPoint explicitly in the plotting command, but I didn’t know how to get the value of ViewPoint that’s I’d implicitly found via my mouse. A comment on the Mathematica Stack Exchange site told me what I needed to know.

Simply edit the output cell and wrap Options[…, ViewPoint] around the already rotated output. The graphics should be in the place of …. ViewVertical may also change during rotating, as well as some other parameters.

Related posts

Jacobi functions with complex parameter

Jacobi functions are complex-valued functions of a complex variable z and a parameter m. Often this parameter is real, and 0 ≤ m < 1. Mathematical software libraries, like Python’s SciPy, often have this restriction. However, m could be any complex number.

The previous couple of posts spoke of the fundamental rectangle for Jacobi functions. But in general, Jacobi functions (and all other elliptic functions) have a fundamental parallelogram.

When m is real, the two periods of the Jacobi sn function are 4K(m) and 2K(1-m) i and so the function repeats horizontally and vertically. (Here K is the complete elliptic integral of the first kind.) When m is complex, the two periods are again 4K(m) and 2K(1-m) i, but now 4K(m) and 2K(1-m) are complex numbers and their ratio is not necessarily real. This means sn repeats over a parallelogram which is not necessarily a rectangle.

The rest of the post will illustrate this with plots.

First, here is a plot of sn(K(1/2)z, 1/2). The height of the graph represents the absolute value of the function and the color represents its phase. (The argument z was multiplied by K(1/2) to make the periods have integer values, making it a little easier to see the periods.)

Notice that the plot features line up with the coordinate axes, the real axis running from 0 to 8 in the image and the complex axis running from 0 to 4.

Here’s the analogous plot for sn(z, 2 + 2i).

Now the features are running on a diagonal. The pits are where the function is zero and the white ellipses are poles that have the tops cut off to fit in a finite plot.

It will be easier to see what’s going on if we switch to flat plots. The color represents phase as before, but now magnitude is denoted by contour lines.

Here’s a plot of sn(K(1/2)z, 1/2)

and here’s a plot of sn(z, 2 + 2i).

According to theory the two periods should be

4 K(2 + 2i) = 4.59117 + 2.89266 i

and

2i K(-1 – 2i) = -1.44633 + 2.29559 i.

We can show that this is the case by plotting the real and imaginary parts of sn(z, 2 + 2i) as we move in these two directions.

The Mathematica code

    Plot[
        {Re[JacobiSN[t EllipticK[2 + 2 I], (2 + 2 I)]],
         Im[JacobiSN[t EllipticK[2 + 2 I], (2 + 2 I)]]}, 
        {t, 0, 4}]

produces this plot

and the code

    Plot[
        {Re[JacobiSN[t I EllipticK[-1 - 2 I], (2 + 2 I)]], 
         Im[JacobiSN[t I EllipticK[-1 - 2 I], (2 + 2 I)]]}, 
        {t, 0, 4}]

produces this plot.

Related posts

When a cubic or quartic has a double root

Thanks to the quadratic equation, it’s easy to tell whether a quadratic equation has a double root. The equation

ax^2 + bx + c = 0

has a double root if and only if the discriminant

b^2 - 4ac

is zero.

The discriminant of a cubic is much less known, and the analogs for higher order polynomials are unheard of. There is a discriminant for polynomials of all degrees, though the complexity of the discriminant grows quickly with the degree of the polynomial.

This post will derive the discriminant of a cubic and a quartic.

Resultants

The resultant of a pair of polynomials is zero if and only if the two polynomials have a common root. Resultants have come up in a couple previous posts about solving trigonometric equations.

A polynomial p(x) has a double root if p and its derivative p‘ are both zero somewhere. The discriminant of p is the resultant of p and p’.

The resultant of two polynomials is a determinant of their Sylvester matrix. This matrix is easier to describe by example than by equation. You basically fill a matrix with shifts of the coefficients of both polynomials and fill in the gaps with zeros.

MathWorld gives the following Mathematica code for the Sylvester matrix of two inputs.

    SylvesterMatrix1[poly1_, poly2_,  var_] :=
      Function[{coeffs1, coeffs2}, With[
        {l1 = Length[coeffs1], l2 = Length[coeffs2]},
          Join[
            NestList[RotateRight, PadRight[coeffs1,
              l1 + l2 -  2], l2 - 2],
            NestList[RotateRight, PadRight[coeffs2,
              l1 + l2 - 2], l1 - 2]
          ]
        ]
      ][
        Reverse[CoefficientList[poly1, var]],
        Reverse[CoefficientList[poly2, var]]
    ]

Cubic discriminant

If we apply this to the cubic polynomial

ax^3 + bx^2 + dx + c

we get the following matrix.

\left( \begin{array}{ccccc} a & b & c & d & 0 \\ 0 & a & b & c & d \\ 3 a & 2 b & c & 0 & 0 \\ 0 & 3 a & 2 b & c & 0 \\ 0 & 0 & 3 a & 2 b & c \\ \end{array} \right)

We can compute the resultant by taking the determinant of the above matrix.

    g[x_] := a x^3 + b x^2 + c x + d
    SylvesterMatrix1[g[x], D[g[x], x], x]

We get the following result

-a b^2 c^2 + 4 a^2 c^3 + 4 a b^3 d - 18 a^2 b c d + 27 a^3 d^2

and we can verify that this is the same result we would get from calling the Resultant directly with

    Resultant[g[x], D[g[x], x], x]

Although the resultant is defined in terms of a determinant, that doesn’t mean that resultants are necessarily computed by computing determinants. The Sylvester matrix is a very special matrix, and there are clever ways to exploit its structure to create more efficient algorithms.

Each term in the resultant has a factor of a, and the discriminant is the resultant divided by −a.

Quartic discriminant

Now let’s repeat our exercise for the quartic. The Sylvester matrix for the quartic polynomial

ax^4 + bx^3 + cx^2 + dx + e

and its derivative is

\left( \begin{array}{ccccccc} a & b & c & d & e & 0 & 0 \\ 0 & a & b & c & d & e & 0 \\ 0 & 0 & a & b & c & d & e \\ 4 a & 3 b & 2 c & d & 0 & 0 & 0 \\ 0 & 4 a & 3 b & 2 c & d & 0 & 0 \\ 0 & 0 & 4 a & 3 b & 2 c & d & 0 \\ 0 & 0 & 0 & 4 a & 3 b & 2 c & d \\ \end{array} \right)

I created the image above with the following Mathematica code.

    f[x_] := a x^4 + b x^3 + c x^2 + d x + e
    TeXForm[SylvesterMatrix1[f[x], D[f[x], x], x]]

If we take the determinant, we get the resultant, but it’s a mess.

256 a^4 e^3-192 a^3 b d e^2-128 a^3 c^2 e^2+144 a^3 c d^2 e \\ -27 a^3 d^4+144 a^2 b^2 c e^2-6 a^2 b^2 d^2 e-80 a^2 b c^2 d e \\ +18 a^2 b c d^3+16 a^2 c^4 e-4 a^2 c^3 d^2-27 a b^4 e^2\\ +18 a b^3 c d e -4 a b^3 d^3 -4 a b^2 c^3 e+a b^2 c^2 d^2

Again each term has a factor of a, so we can divide by a. to get the discriminant.

If we want to use this in code, we can have Mathematica export the expression in C code using CForm. To generate Python code, it’s more convenient to use FortranForm since Python like Fortran uses ** for exponents.

The following Python code was created by pasting the output of

    FortranForm[Resultant[f[x], D[f[x], x], x]]

and making it into a function.

    def quartic_resultant(a, b, c, d, e):
        return a*b**2*c**2*d**2 - 4*a**2*c**3*d**2 - 4*a*b**3*d**3 + 18*a**2*b*c*d**3 \ 
        - 27*a**3*d**4 - 4*a*b**2*c**3*e + 16*a**2*c**4*e + 18*a*b**3*c*d*e \
        - 80*a**2*b*c**2*d*e - 6*a**2*b**2*d**2*e + 144*a**3*c*d**2*e \
        - 27*a*b**4*e**2 + 144*a**2*b**2*c*e**2 - 128*a**3*c**2*e**2 \
        - 192*a**3*b*d*e**2 + 256*a**4*e**3

Let’s try this on a couple examples. First

x^2(x-2)(x-3) = x^4 -5x^3 + 6x^2

which has a double root at 0.

As expected

    quartic_resultant(1, -5, 6, 0, 0)

returns 0.

Next let’s try

(x-1)(x-2)(x-3)(x-4) = x^4 -10x^3 +35x^2 - 50x + 24

The call

    quartic_resultant(1, -10, 35, -50, 24)

returns 144. We expected a non-zero result since our polynomial has distinct roots at 1, 2, 3, and 4.

Higher order discriminants

In general the discriminant of an nth degree polynomial is the resultant of the polynomial and its derivative, up to a constant. There’s no need to worry about this constant if you’re only concern is whether the discriminant is zero. To get the exact discriminant, you divide the resultant by the leading coefficient of the polynomial and adjust the sign.

The sign convention is a little strange. If you look back at the examples above, we divided by −a in the cubic case but we divided by a in the quartic case. You might reasonably guess that you should divide by a and multiply by (−1)n. But that won’t give you right sign for the quadratic case. The conventional sign is

(−1)n(n − 1)/2.

So when n equals 2 or 3 we get a negative sign, but when n equals 4 we don’t.

Related posts

Three interesting curves

Here are three curves that have interesting names and interesting shapes.

The fish curve

The fish curve has parameterization

x(t) = cos(t) – sin²(t)/√2
y(t) = cos(t) sin(t)

We can plot this curve in Mathematica with

    ParametricPlot[
        {Cos[t] - Sin[t]^2/Sqrt[2], Cos[t] Sin[t]}, 
        {t, 0, 2 Pi}]

to get the following.

Fish curve

It’s interesting that the image has two sharp cusps: the parameterization consists of two smooth functions, but the resulting image is not smooth. That’s because the velocity of the curve is zero at 3π/4 and 7π/4.

I found the fish curve by poking around Mathematica’s PlainCurveData function. I found the next two in the book Mathematical Models by H. Martyn Cundy and A. P. Rollett.

The electric motor curve

Next is the “electric motor” curve. Cundy and Rollett’s book doesn’t provide illustrations for the plane curves it lists on pages 71 and 72, and I plotted the electric motor curve because I was curious why it was called that.

The curve has equation

y²(y² − 96) = x² (x² − 100)

and we can plot it in Mathematica with

    ContourPlot[ y^2 (y^2 - 96) == x^2 (x^2 - 100), 
        {x, -20, 20}, {y, -20, 20}

which gives the following.

electric motor curve

Sure enough, it looks like the inside of an electric motor!

The ampersand curve

I was also curious about the “ampersand curve” because of its name. The curve has equation

(y² − x²) (x − 1) (2x − 3) = 4 (x² + y² − 2x

and can be plotted in Mathematica with

    ContourPlot[(y^2 - x^2) (x - 1) (2 x - 3) == 
                 4 (x^2 + y^2 - 2 x)^2, 
                 {x, -0.5, 2}, {y, -1.5, 1.5}, 
                 PlotPoints -> 100]

The produces the following pleasant image.

ampersand curve, smooth version

Note that the code specifies PlotPoints. Without this argument, the default plotting resolution produces a misleading plot, making it appear that the curve has three components.

Ampersand curve, misleading version plotted with too few points

Related posts

Average digit sum

Suppose you write down a number and take the sum of its digits. In what base will this sum be the smallest on average?

Let’s do a couple examples comparing base 10 and base 2. The number 2022 in base 10 has digit sum 6, but its binary equivalent 11111100110 has digit sum 8, so the base 10 representation has the smaller digit sum. But if we take 1024 in base 10, the digit sum is 7, but 1024 = 210 and so the sum of its binary digits is just 1.

In [1] the author proves that for sufficiently large N, the average digit sum for all positive integers less than N is the least in base 2. Moreover, the author shows that as N goes to infinity, the average digit sum of numbers less than N written in radix r is asymptotically equal to

(r – 1) log N / 2 log r.

This is an increasing function of r and so not only does base 2 have the minimum average digit sum, the average digit sum increases with the size of the base.

Let’s see how this works out with N = 1,000,000 and r ranging from 2 to 10 using the following Mathematica code.

    Table[
        Sum[ Total[IntegerDigits[n, r]], 
            {n, 0, 1000000}] /1000000., 
        {r, 2, 10}
    ]

This returns

    {9.885, 12.336, 14.8271, 16.5625, 18.5806, 
     20.5883, 22.2383, 24.1978, 27.}

Now let’s see what values the asymptotic formula predicted.

    Table[(r - 1) Log[1000000.]/(2 Log[r]), {r, 2, 10}]

This returns

    {9.96578, 12.5754, 14.9487, 17.1681, 19.2765, 
     21.2993, 23.2535, 25.1508, 27.}

So for each r the asymptotic formula produces a good approximation to the actual result.

[1] L. E. Bush. Asymptotic Formula for the Average Sum of the Digits of Integers. The American Mathematical Monthly, Mar., 1940, Vol. 47, No. 3, pp. 154-156