Quanta Magazine posted an article yesterday about the sum-product problem of Paul Erdős and Endre Szemerédi. This problem starts with a finite set of real numbers *A* then considers the size of the sets *A*+*A* and *A***A*. That is, if we add every element of *A* to every other element of *A*, how many distinct sums are there? If we take products instead, how many distinct products are there?

## Proven results

Erdős and Szemerédi proved that there are constants *c* and ε > 0 such that

max{|*A*+*A*|, |*A***A*|} ≥ *c*|*A*|^{1+ε}

In other words, either *A*+*A* or *A***A* is substantially bigger than *A*. Erdős and Szemerédi only proved that some positive ε exists, but they suspected ε could be chosen close to 1, i.e. that either |*A*+*A*| or |*A***A*| is bounded below by a fixed multiple of |*A*|² or nearly so. George Shakan later showed that one can take ε to be any value less than

1/3 + 5/5277 = 0.3342899…

but the conjecture remains that the upper limit on ε is 1.

## Python code

The following Python code will let you explore the sum-product conjecture empirically. It randomly selects sets of size *N* from the non-negative integers less than *R*, then computes the sum and product sets using set comprehensions.

from numpy.random import choice def trial(R, N): # R = integer range, N = sample size x = choice(R, N, replace=False) s = {a+b for a in x for b in x} p = {a*b for a in x for b in x} return (len(s), len(p))

When I first tried this code I thought it had a bug. I called `trial`

10 times and got the same values for |*A*+*A*| and |*A***A*| every time. That was because I chose *R* large relative to *N*. In that case, it is likely that every sum and every product will be unique, aside from the redundancy from commutativity. That is, if *R* >> *N*, it is likely that |*A*+*A*| and |*A***A*| will both equal *N*(*N*+1)/2. Things get more interesting when *N* is closer to *R*.

## Probability vs combinatorics

The Erdős-Szemerédi problem is a problem in combinatorics, looking for deterministic lower bounds. But it seems natural to consider a probabilistic extension. Instead of asking about **lower bounds** on |*A*+*A*| and |*A***A*| you could ask for the **distribution** on |*A*+*A*| and |*A***A*| when the sets *A* are drawn from some probability distribution.

If the set *A* is drawn from a continuous distribution, then |*A*+*A*| and |*A***A*| both equal *N*(*N*+1)/2 with probability 1. Only careful choices, ones that would happen randomly with probability zero, could prevent the sums and products from being unique, modulo commutativity, as in the case *R* >> *N* above.

If the set *A* is an arithmetic sequence then |*A*+*A*| is small and |*A***A*| is large, and the opposite holds if *A* is a geometric sequence. So it might be interesting to look at the correlation of |*A*+*A*| and |*A***A*| when *A* comes from a discrete distribution, such as choosing *N* integers uniformly from [1, *R*] when *N*/*R* is not too small.

“ε to be any value less than” -> “ε to be any value greater than”

Derek, I should not have said O(|A|²) since that’s an upper bound, and the conjecture is a lower bound. But I believe the statement “ε to be any value less than” is right, because we’re looking at lower bounds.

Oops. Ok, scratch that wording ‘improvement’ suggestion. What about: “ε is at least”?

“When I first tried this code I thought it had a bug.”

Well you have a missing bracket at the end ;)

I ran a simulation with R = 50, various values of N, and 1 million trials, and got the following:

N correlation

3 -0.0084264145

4 -0.0070103243

5 -0.0050980365

6 -0.0037107498

7 -0.0003860069

8 -0.0007805693

9 -0.0023016013

10 -0.0033701870

11 -0.0039240412

12 -0.0051153533

13 -0.0072732451

14 -0.0117458415

15 -0.0146047359

It is not surprising that the correlation is small and negative. It is somewhat more interesting that there seems to be a peak somewhere around N = 7 where the correlation is near 0.

Look up the Barack, Impagliazzo and Wigderson paper on extracting entropy from multiple independent sources. It starts with the Erdős-Szemerédi proof and its extension to fields by Tao, then shows that A*B+C (where A, B and C are min entropy sources) is good extractor. We used that proof and some other trickery to build the world’s most efficient random number generator: https://ieeexplore.ieee.org/document/7480363 .