# Poisson distribution and prime numbers

Let ω(n) be the number of distinct prime factors of x. A theorem of Landau says that for N large, then for randomly selected positive integers less than N, ω-1 has a Poisson(log log N) distribution. This statement holds in the limit as N goes to infinity.

Apparently N has to be extremely large before the result approximately holds. I ran an experiment for N = 10,000,000 and the fit is not great.

Here’s the data:

```|----+----------+-----------|
|    | Observed | Predicted |
|----+----------+-----------|
|  1 |   665134 |  620420.6 |
|  2 |  2536837 | 1724733.8 |
|  3 |  3642766 | 2397330.6 |
|  4 |  2389433 | 2221480.4 |
|  5 |   691209 | 1543897.1 |
|  6 |    72902 |  858389.0 |
|  7 |     1716 |  397712.0 |
|  8 |        1 |  157945.2 |
|  9 |        0 |   54884.8 |
| 10 |        0 |   16952.9 |
|----+----------+-----------|```

And here’s a plot: I found the above theorem in these notes. It’s possible I’ve misunderstood something or there’s an error in the notes. I haven’t found a more formal reference on the theorem yet.

Update: According to the results in the comments below, it seems that the notes I cited may be wrong. The notes say “distinct prime factors”, i.e. ω(n), while numerical results suggest they meant to say Ω(n), the number of prime factors counted with multiplicity. I verified the numbers given below. Here’s a plot. Here’s the Python code I used to generate the table. (To match the code for the revised graph, replace `omega` which calculated ω(n) with the SymPy function `primeomega` which calculates Ω(n).)

```from sympy import factorint
from numpy import empty, log, log2
from scipy.stats import poisson

N = 10000000

def omega(n):
return len(factorint(n))

table = empty(N, int)
table = table = 0
for n in range(2, N):
table[n] = omega(n)

# upper bound on the largest value of omega(n) for n < N.
u = int(log2(N))

for n in range(1, u+1):
print(n, len(table[table==n]), poisson(log(log(N))).pmf(n-1)*N)
```

## 2 thoughts on “Poisson distribution and prime numbers”

1. mpledger

It was interesting so I started playing around with it in R. I decided that the distribution didn’t look much like the log(log(N)) Poisson distribution and the two means weren’t even close.

So I decided to drop the unique part of the prime factors and it feels* much better. (My math aint good enough to know if I am right).
Here is the inefficient R code
(Two simulations – one using the average of the original data and one using the log(log(N)) mean)
~~~~~~~~~~~~
library(gmp)
N<- 10000001
pfc <-matrix(0,N-1,1)

for (i in 2:N) {
# pfc[i-1]<-length(unique(factorize(i)))
pfc[i-1]<-length(factorize(i))
}

meanpfc<-sum(pfc-1)/(N-1)

ppfc<-rpois(N-1,meanpfc)+1
ppfc2<-rpois(N-1,log(log(N-1)))+1

u<-max(ppfc,pfc,ppfc2)
fppfc2<-factor(ppfc2,levels=1:u)
fppfc<-factor(ppfc,levels=1:u)
fpfc<-factor(pfc,levels=1:u)

obs<-table(fpfc)
pred<-table(fppfc)
predlln<-table(fppfc2)
cbind(obs,pred, predlln)
~~~~~~~~~~~~~~~~~~
Table of data
obs pred predlln
1 664579 617455 620119
2 1904325 1716905 1722713
3 2444359 2394090 2395522
4 2050696 2223963 2223467
5 1349779 1546252 1545203
6 774078 863598 858951
7 409849 399329 397826
8 207207 159158 157759
9 101787 55695 55203
10 49163 17139 16926
11 23448 4815 4786
12 11068 1257 1179
13 5210 266 276
14 2406 64 58
15 1124 9 11
16 510 4 1
17 233 1 0
18 102 0 0
19 45 0 0
20 21 0 0
21 7 0 0
22 3 0 0
23 1 0 0

~~~~~~~~~~
The average of the original data (minus 1) is 2.7861251. The average of the simulated Poisson distribution with the previous mean is 2.7854423 (results will vary!) and log(log(10000000)) =2.779943

It's still odd in the right hand tail but much better in the left hand tail.