Uniformity increases entropy

Posted on 23 June 2025 by John

Suppose you have a system with n possible states. The entropy of the system is maximized when all states are equally likely to occur. The entropy is minimized when one outcome is certain to occur.

You can say more. Starting from any set of probabilities, as you move in the direction of more uniformity, you increase entropy. And as you move in the direction of less uniformity, you decrease entropy.

These statements can be quantified and stated more precisely. That’s what the rest of this post will do.

***

Let p_i be the probability of the ith state and let p be the vector of the p_i.

$\mathbf{p} = (p_1, p_2, \ldots, p_n)$

Then the entropy of p is defined as

$H(\mathbf{p}) = -\sum_{i=1}^n p_i \log_2 p_i$

If one of the p‘s is 1 and the rest of the p‘s are zero, then H(p) = 0. (In the definition of entropy, 0 log₂ 0 is taken to be 0. You could justify this as the limit of x log₂ x as x goes to zero.)

If each of the p_i are equal, p_i = 1/n, then H(p) = log₂ n. The fact that this is the maximum entropy, and that compromises between the two extremes always decrease entropy, comes from the fact that the entropy function H is concave (proof). That is, if p₁ is one list of probabilities and p₂ another, then

$H((1-\lambda) \mathbf{p}_1 + \lambda\mathbf{p}_2) \geq (1 - \lambda) H( \mathbf{p}_1) + \lambda H(\mathbf{p}_2)$

When we speak informally of moving from p₁ in the direction of p₂, we mean we increase the parameter λ from 0 to some positive amount no more than 1.

Because entropy is concave, there are no local maxima. As you approach the location of global maximum entropy, i.e. equal state probabilities, from any direction, entropy increases monotonically.

The rise and fall of binomial coefficients

Posted on 21 May 2013 by John

When you expand (x + y)ⁿ, the coefficients increase then decrease. The largest coefficient is in the middle if n is even; it’s the two in the middle if n is odd. For example, the coefficients for (1 + x)⁴ are 1, 4, 6, 4, 1 and the coefficients for (1 + x)⁵ are 1, 5, 10, 10, 5, 1.

More generally, if a > 0 and b > 0, the coefficients of (ax + by)ⁿ can only have one maximum. They may increase, or decrease, or change direction once, but they cannot wiggle more than that. They can’t, for example, increase, decrease, then increase again.

Here’s a proof. The coefficients are

${n \choose k} a^k b^{n-k}$

To show that the coefficients are unimodal as a function of k, we’ll show that their logarithms are unimodal. And we’ll do that by showing that they are samples from a concave function.

The log of the kth coefficient is

log Γ(n + 1) − log Γ(k + 1) − log Γ(n − k + 1) + k log a + (n − k) log b.

As a function of k, the terms

log Γ(n + 1) + k log a + (n − k) log b

form an affine function. The function log Γ is convex, so −log Γ is concave. The composition of a concave function with an affine function is concave, so −log Γ(k + 1) and −log Γ(n − k + 1) are concave functions of k. The sum of concave functions is concave. And the sum of a concave function with an affine function is concave. So binomial coefficients are log-concave and they can only have one maximum.

(The fact log Γ(z) is a convex is the easy direction of the Bohr-Mollerup theorem. The harder direction is that Γ(z) is the only way to extend factorials to all reals that is log-convex.)

Splitting a convex set through its center

Posted on 23 January 2009 by John

Three days ago I raised the question How unevenly can you split a convex set through its center? Scott had asked in the comments what kind of sets split most unevenly through their centers. I didn’t have an answer at the time, but now I have something.

Start with a triangle. Draw a line from the middle of the base up to the top. The center of mass is located 1/3 of the way up that line.

Now draw a horizontal line through the center of mass. The triangle above this line is similar to the entire triangle, but will have 2/3 of the original height. By similarity, it also has 2/3 of the base, and so it has 4/9 of the area of the entire triangle. The portion below the center of mass has 5/9 of the area.

Now let’s go up a dimension and imagine a tetrahedron sitting on a table. Draw a line from the center of the base up to the top. The center of mass for the tetrahedron will be located along that line, 1/4 of the way up. Pass a plane through the center of mass parallel to the table. This forms a new tetrahedron above the cutting plane, similar to the original tetrahedron. The new tetrahedron has 3/4 of the height, and by similarity, it has (3/4)³ = 27/64 of the original volume. Of course the solid below the cutting plane then has 37/64 of the original volume.

Now consider the general case. The center of mass of an n-simplex is located 1/(n + 1) of the way up along a line running from the center of the base to the top. Cut the n-simplex through the center of mass with a hyperplane parallel to the base. The n-simplex above the hyperplane has n/(n + 1) of the original height and (n/(n + 1))ⁿ of the original volume. As n goes to infinity, this expression converges to 1/e.

In summary, we’ve shown that a convex set in Rⁿ can have at least

1 − (n/(n + 1))ⁿ

of its volume on one side of a hyperplane through the center of mass by constructing such a set, namely any n-simplex. And as n goes to infinity, this volume approaches 1 − 1/e. We have not shown that an n-simplex is the worst case in each dimension n, though I suspect this is true. If you take a ball in Rⁿ, any plane through the center divides the ball exactly in half. In some sense, an n-simplex is as far as a convex set can get from a ball. It’s the least-round convex shape.

Update: Here’s a paper on the topic. B. Grunbaum, “Partitions of mass-distributions and convex bodies by hyperplanes,” Paciﬁc Journal of Mathematics. 10, 1257–1261, 1960.

How unevenly can you split an convex set through its center?

Posted on 20 January 2009 by John

Here’s a surprising theorem. Suppose you have a convex set in Rⁿ you pass a plane through its center of mass. How much of the volume of the set can be on one side? No more than about 63%. (Precisely, 1 − 1/e.) This holds for any dimension n and for any direction through the center. I don’t have a reference for this theorem except that it is mentioned near the end of lecture 5 in this course.

Update: See Splitting a convex set through its center for an illustration and a partial proof.

Log concave functions

Posted on 9 January 2009 by John

Log concave functions have some very interesting and useful properties. I’ll list some of these shortly after a three definitions.

A function is convex if the line segment joining two points on the graph lies above the graph. In symbols, f(x) is convex if for every t between 0 and 1, and for every x and y,

f(tx + (1 − t)y) ≤ t f(x) + (1 − t) f(y).

A function f(x) is concave if −f(x) is convex. Equivalently, the line connecting two points on the graph lies below the graph. In symbols, reverse the direction of the inequality for convexity.

A function f(x) is log concave if log( f(x) ) is concave.

The basic properties of convex functions are obvious. It’s easy to show that the sum of two convex functions is convex, the maximum of two convex functions is convex, etc. It would seem that log concave functions would be unremarkable because they are so simply related to convex functions. But they have some surprising properties.

The Laplace transform of a non-negative function is a log convex.
The running average of a log concave function is also log concave.
If a random variable has a log concave PDF (probability density function), then the CDF (cumulative distribution function) is also log concave.
The convolution of two log concave functions is log concave.
If two independent random variables have log-concave PDFs, then their sum also has a log concave PDF.

A few properties of log concave functions do follow easily from analogous properties of convex and concave functions.

The product of log concave functions is log concave.
A function f(x) is log concave if and only if for every t between 0 and 1 and for every x and y,
f(tx + (1 − t)y) ≥ f(x)^t f(y)^1−t
A smooth function f(x) is log concave if and only if the inequality f ∇² f ≤ ∇ f ⋅ ∇ f holds everywhere.

Here are a few specific functions that are log concave.

The exponential function e^ax
The inverse logit function e^x/(e^x + 1)
The function ∏ x_i / ∑ x_i
The function x^a for a ≥ 0
Determinant is a log concave function on positive definite matrices

For more, see Stephen Boyd’s book Convex Optimization.

Convex optimization

Posted on 7 January 2009 by John

I’ve enjoyed following Stephen Boyd’s lectures on convex optimization. I stumbled across a draft version of his textbook a few years ago but didn’t realize at first that the author and the lecturer were the same person. I recommend the book, but I especially recommend the lectures. My favorite parts of the lectures are the off-hand remarks about which methods are well known and which are not, what different fields call the same methods, which concerns are merely academic minutia and which are practically important. This kind of information is almost impossible to pick up just from reading books and articles.

A surprisingly large set of problems can be formulated as minimizing a convex function subject to convex constraints, and algorithms for solving these problems are very well developed. It’s roughly true that an optimization problem is tractable if and only if it is convex. Of course that’s not exactly true. Many special non-convex problems can be solved easily, though if you write down an arbitrary non-convex optimization problem, it’s probably hard to solve. On the other hand, enormous convex optimization problems can often be solved quite efficiently.

I became interested in optimization a few years ago in order to solve some problems that came up at work. After going through Boyd’s lectures, I’d like to go back and do a better job on some of these problems. For example, the EffTox dose-finding software solves an optimization problem to determine prior parameters based on input from doctors. That optimization is slow and not very robust. If I can find the time to do it, I think I could improve that software.