Quicksort and prime numbers

The average number of operations needed for quicksort to sort a list of n items is approximately 10 times the nth prime number.

Here’s some data to illustrate this.

|------+-----------------+---------|
|    n | avg. operations | 10*p(n) |
|------+-----------------+---------|
|  100 |          5200.2 |    5410 |
|  200 |         12018.3 |   12230 |
|  300 |         19446.9 |   19870 |
|  400 |         27272.2 |   27410 |
|  500 |         35392.2 |   35710 |
|  600 |         43747.3 |   44090 |
|  700 |         52297.8 |   52790 |
|  800 |         61015.5 |   61330 |
|  900 |         69879.6 |   69970 |
| 1000 |         78873.5 |   79190 |
| 1100 |         87984.4 |   88310 |
| 1200 |         97201.4 |   97330 |
| 1300 |        106515.9 |  106570 |
| 1400 |        115920.2 |  116570 |
| 1500 |        125407.9 |  125530 |
| 1600 |        134973.5 |  134990 |
| 1700 |        144612.1 |  145190 |
| 1800 |        154319.4 |  154010 |
| 1900 |        164091.5 |  163810 |
| 2000 |        173925.1 |  173890 |
|------+-----------------+---------|

The maximum difference between the quicksort and prime columns is about 4%. In the latter half of the table, the maximum error is about 0.4%.

What’s going on here? Why should quicksort be related to prime numbers?!

The real mystery is the prime number theorem, not quicksort. The prime number theorem tells us that the nth prime number is approximately n log n. And the number of operations in an efficient sort is proportional to n log n. The latter is easier to see than the former.

A lot of algorithms have run time proportional to n log n: mergesort, heapsort, FFT (Fast Fourier Transform), etc. All these have run time approximately proportional to the nth prime.

Now for the fine print. What exactly is the average run time for quicksort? It’s easy to say it’s O(n log n), but getting more specific requires making assumptions. I used as the average number of operations 11.67 n log n – 1.74 n based on Knuth’s TAOCP, Volume 3. And why 10 times the nth prime and not 11.67? I chose 10 to make the example work better. For very large values on n, a larger coefficient would work better.

 

Leave a Reply

Your email address will not be published. Required fields are marked *