The average number of operations needed for quicksort to sort a list of *n* items is approximately 10 times the *n*th prime number.

Here’s some data to illustrate this.

|------+-----------------+---------| | n | avg. operations | 10*p(n) | |------+-----------------+---------| | 100 | 5200.2 | 5410 | | 200 | 12018.3 | 12230 | | 300 | 19446.9 | 19870 | | 400 | 27272.2 | 27410 | | 500 | 35392.2 | 35710 | | 600 | 43747.3 | 44090 | | 700 | 52297.8 | 52790 | | 800 | 61015.5 | 61330 | | 900 | 69879.6 | 69970 | | 1000 | 78873.5 | 79190 | | 1100 | 87984.4 | 88310 | | 1200 | 97201.4 | 97330 | | 1300 | 106515.9 | 106570 | | 1400 | 115920.2 | 116570 | | 1500 | 125407.9 | 125530 | | 1600 | 134973.5 | 134990 | | 1700 | 144612.1 | 145190 | | 1800 | 154319.4 | 154010 | | 1900 | 164091.5 | 163810 | | 2000 | 173925.1 | 173890 | |------+-----------------+---------|

The maximum difference between the quicksort and prime columns is about 4%. In the latter half of the table, the maximum error is about 0.4%.

What’s going on here? Why should quicksort be related to prime numbers?!

The real mystery is the prime number theorem, not quicksort. The prime number theorem tells us that the *n*th prime number is approximately *n* log *n*. And the number of operations in an efficient sort is proportional to *n* log *n*. The latter is easier to see than the former.

A lot of algorithms have run time proportional to *n* log *n*: mergesort, heapsort, FFT (Fast Fourier Transform), etc. All these have run time approximately proportional to the *n*th prime.

Now for the fine print. What exactly is the average run time for quicksort? It’s easy to say it’s O(*n* log *n*), but getting more specific requires making assumptions. I used as the average number of operations 11.67 *n* log *n* – 1.74 *n* based on Knuth’s TAOCP, Volume 3. And why 10 times the *n*th prime and not 11.67? I chose 10 to make the example work better. For very large values on *n*, a larger coefficient would work better.

In sorting algorithms two operations are used – comparison and swap.

Analysis of sorting algorithms commonly relates to the number of comparisons. So, ‘n’ is the number of comparisons. This point should have been clarified.