# Exploring the sum-product conjecture

Quanta Magazine posted an article yesterday about the sum-product problem of Paul Erdős and Endre Szemerédi. This problem starts with a finite set of real numbers A then considers the size of the sets A+A and A*A. That is, if we add every element of A to every other element of A, how many distinct sums are there? If we take products instead, how many distinct products are there?

## Proven results

Erdős and Szemerédi proved that there are constants c and ε > 0 such that

max{|A+A|, |A*A|} ≥ c|A|1+ε

In other words, either A+A or A*A is substantially bigger than A. Erdős and Szemerédi only proved that some positive ε exists, but they suspected ε could be chosen close to 1, i.e. that either |A+A| or |A*A| is bounded below by a fixed multiple of |A|² or nearly so. George Shakan later showed that one can take ε to be any value less than

1/3 + 5/5277 = 0.3342899…

but the conjecture remains that the upper limit on ε is 1.

## Python code

The following Python code will let you explore the sum-product conjecture empirically. It randomly selects sets of size N from the non-negative integers less than R, then computes the sum and product sets using set comprehensions.

```    from numpy.random import choice

def trial(R, N):
# R = integer range, N = sample size
x = choice(R, N, replace=False)
s = {a+b for a in x for b in x}
p = {a*b for a in x for b in x}
return (len(s), len(p))
```

When I first tried this code I thought it had a bug. I called `trial` 10 times and got the same values for |A+A| and |A*A| every time. That was because I chose R large relative to N. In that case, it is likely that every sum and every product will be unique, aside from the redundancy from commutativity. That is, if R >> N, it is likely that |A+A| and |A*A| will both equal N(N+1)/2. Things get more interesting when N is closer to R.

## Probability vs combinatorics

The Erdős-Szemerédi problem is a problem in combinatorics, looking for deterministic lower bounds. But it seems natural to consider a probabilistic extension. Instead of asking about lower bounds on |A+A| and |A*A| you could ask for the distribution on |A+A| and |A*A| when the sets A are drawn from some probability distribution.

If the set A is drawn from a continuous distribution, then |A+A| and |A*A| both equal N(N+1)/2 with probability 1. Only careful choices, ones that would happen randomly with probability zero, could prevent the sums and products from being unique, modulo commutativity, as in the case R >> N above.

If the set A is an arithmetic sequence then |A+A| is small and |A*A| is large, and the opposite holds if A is a geometric sequence. So it might be interesting to look at the correlation of |A+A| and |A*A| when A comes from a discrete distribution, such as choosing N integers uniformly from [1, R] when N/R is not too small.

## 6 thoughts on “Exploring the sum-product conjecture”

1. “ε to be any value less than” -> “ε to be any value greater than”

2. Derek, I should not have said O(|A|²) since that’s an upper bound, and the conjecture is a lower bound. But I believe the statement “ε to be any value less than” is right, because we’re looking at lower bounds.

3. Oops. Ok, scratch that wording ‘improvement’ suggestion. What about: “ε is at least”?

4. Andrei

“When I first tried this code I thought it had a bug.”

Well you have a missing bracket at the end ;)

5. Nathan Hannon

I ran a simulation with R = 50, various values of N, and 1 million trials, and got the following:

N correlation
3 -0.0084264145
4 -0.0070103243
5 -0.0050980365
6 -0.0037107498
7 -0.0003860069
8 -0.0007805693
9 -0.0023016013
10 -0.0033701870
11 -0.0039240412
12 -0.0051153533
13 -0.0072732451
14 -0.0117458415
15 -0.0146047359

It is not surprising that the correlation is small and negative. It is somewhat more interesting that there seems to be a peak somewhere around N = 7 where the correlation is near 0.

6. Look up the Barack, Impagliazzo and Wigderson paper on extracting entropy from multiple independent sources. It starts with the Erdős-Szemerédi proof and its extension to fields by Tao, then shows that A*B+C (where A, B and C are min entropy sources) is good extractor. We used that proof and some other trickery to build the world’s most efficient random number generator: https://ieeexplore.ieee.org/document/7480363 .