Quanta Magazine posted an article yesterday about the sum-product problem of Paul Erdős and Endre Szemerédi. This problem starts with a finite set of real numbers *A* then considers the size of the sets *A*+*A* and *A***A*. That is, if we add every element of *A* to every other element of *A*, how many distinct sums are there? If we take products instead, how many distinct products are there?

## Proven results

Erdős and Szemerédi proved that there are constants *c* and ε > 0 such that

max{|*A*+*A*|, |*A***A*|} ≥ *c*|*A*|^{1+ε}

In other words, either *A*+*A* or *A***A* is substantially bigger than *A*. Erdős and Szemerédi only proved that some positive ε exists, but they suspected ε could be chosen close to 1, i.e. that either |*A*+*A*| or |*A***A*| is bounded below by a fixed multiple of |*A*|² or nearly so. George Shakan later showed that one can take ε to be any value less than

1/3 + 5/5277 = 0.3342899…

but the conjecture remains that the upper limit on ε is 1.

## Python code

The following Python code will let you explore the sum-product conjecture empirically. It randomly selects sets of size *N* from the non-negative integers less than *R*, then computes the sum and product sets using set comprehensions.

from numpy.random import choice def trial(R, N): # R = integer range, N = sample size x = choice(R, N, replace=False) s = {a+b for a in x for b in x} p = {a*b for a in x for b in x} return (len(s), len(p))

When I first tried this code I thought it had a bug. I called `trial`

10 times and got the same values for |*A*+*A*| and |*A***A*| every time. That was because I chose *R* large relative to *N*. In that case, it is likely that every sum and every product will be unique, aside from the redundancy from commutativity. That is, if *R* >> *N*, it is likely that |*A*+*A*| and |*A***A*| will both equal *N*(*N*+1)/2. Things get more interesting when *N* is closer to *R*.

## Probability vs combinatorics

The Erdős-Szemerédi problem is a problem in combinatorics, looking for deterministic lower bounds. But it seems natural to consider a probabilistic extension. Instead of asking about **lower bounds** on |*A*+*A*| and |*A***A*| you could ask for the **distribution** on |*A*+*A*| and |*A***A*| when the sets *A* are drawn from some probability distribution.

If the set *A* is drawn from a continuous distribution, then |*A*+*A*| and |*A***A*| both equal *N*(*N*+1)/2 with probability 1. Only careful choices, ones that would happen randomly with probability zero, could prevent the sums and products from being unique, modulo commutativity, as in the case *R* >> *N* above.

If the set *A* is an arithmetic sequence then |*A*+*A*| is small and |*A***A*| is large, and the opposite holds if *A* is a geometric sequence. So it might be interesting to look at the correlation of |*A*+*A*| and |*A***A*| when *A* comes from a discrete distribution, such as choosing *N* integers uniformly from [1, *R*] when *N*/*R* is not too small.