Exploring the sum-product conjecture

Quanta Magazine posted an article yesterday about the sum-product problem of Paul Erdős and Endre Szemerédi. This problem starts with a finite set of real numbers A then considers the size of the sets A+A and A*A. That is, if we add every element of A to every other element of A, how many distinct sums are there? If we take products instead, how many distinct products are there?

Proven results

Erdős and Szemerédi proved that there are constants c and ε > 0 such that

max{|A+A|, |A*A|} ≥ c|A|^1+ε

In other words, either A+A or A*A is substantially bigger than A. Erdős and Szemerédi only proved that some positive ε exists, but they suspected ε could be chosen close to 1, i.e. that either |A+A| or |A*A| is bounded below by a fixed multiple of |A|² or nearly so. George Shakan later showed that one can take ε to be any value less than

1/3 + 5/5277 = 0.3342899…

but the conjecture remains that the upper limit on ε is 1.

Python code

The following Python code will let you explore the sum-product conjecture empirically. It randomly selects sets of size N from the non-negative integers less than R, then computes the sum and product sets using set comprehensions.

    from numpy.random import choice

    def trial(R, N):
        # R = integer range, N = sample size
        x = choice(R, N, replace=False)
        s = {a+b for a in x for b in x}
        p = {a*b for a in x for b in x}
        return (len(s), len(p))

When I first tried this code I thought it had a bug. I called trial 10 times and got the same values for |A+A| and |A*A| every time. That was because I chose R large relative to N. In that case, it is likely that every sum and every product will be unique, aside from the redundancy from commutativity. That is, if R >> N, it is likely that |A+A| and |A*A| will both equal N(N+1)/2. Things get more interesting when N is closer to R.

Probability vs combinatorics

The Erdős-Szemerédi problem is a problem in combinatorics, looking for deterministic lower bounds. But it seems natural to consider a probabilistic extension. Instead of asking about lower bounds on |A+A| and |A*A| you could ask for the distribution on |A+A| and |A*A| when the sets A are drawn from some probability distribution.

If the set A is drawn from a continuous distribution, then |A+A| and |A*A| both equal N(N+1)/2 with probability 1. Only careful choices, ones that would happen randomly with probability zero, could prevent the sums and products from being unique, modulo commutativity, as in the case R >> N above.

If the set A is an arithmetic sequence then |A+A| is small and |A*A| is large, and the opposite holds if A is a geometric sequence. So it might be interesting to look at the correlation of |A+A| and |A*A| when A comes from a discrete distribution, such as choosing N integers uniformly from [1, R] when N/R is not too small.

6 thoughts on “Exploring the sum-product conjecture”

Derek Jones

7 February 2019 at 16:55

“ε to be any value less than” -> “ε to be any value greater than”
John

7 February 2019 at 17:19

Derek, I should not have said O(|A|²) since that’s an upper bound, and the conjecture is a lower bound. But I believe the statement “ε to be any value less than” is right, because we’re looking at lower bounds.
Derek Jones

7 February 2019 at 17:38

Oops. Ok, scratch that wording ‘improvement’ suggestion. What about: “ε is at least”?
Andrei

8 February 2019 at 04:21

“When I first tried this code I thought it had a bug.”

Well you have a missing bracket at the end ;)
Nathan Hannon

9 February 2019 at 23:29

I ran a simulation with R = 50, various values of N, and 1 million trials, and got the following:

N correlation
3 -0.0084264145
4 -0.0070103243
5 -0.0050980365
6 -0.0037107498
7 -0.0003860069
8 -0.0007805693
9 -0.0023016013
10 -0.0033701870
11 -0.0039240412
12 -0.0051153533
13 -0.0072732451
14 -0.0117458415
15 -0.0146047359

It is not surprising that the correlation is small and negative. It is somewhat more interesting that there seems to be a peak somewhere around N = 7 where the correlation is near 0.
David Johnston

13 February 2019 at 00:26

Look up the Barack, Impagliazzo and Wigderson paper on extracting entropy from multiple independent sources. It starts with the Erdős-Szemerédi proof and its extension to fields by Tao, then shows that A*B+C (where A, B and C are min entropy sources) is good extractor. We used that proof and some other trickery to build the world’s most efficient random number generator: https://ieeexplore.ieee.org/document/7480363 .

Comments are closed.