Let ω(*n*) be the number of distinct prime factors of *x*. A theorem of Landau says that for *N* large, then for randomly selected positive integers less than *N*, ω-1 has a Poisson(log log *N*) distribution. This statement holds in the limit as *N* goes to infinity.

Apparently *N* has to be extremely large before the result approximately holds. I ran an experiment for *N* = 10,000,000 and the fit is not great.

Here’s the data:

|----+----------+-----------| | | Observed | Predicted | |----+----------+-----------| | 1 | 665134 | 620420.6 | | 2 | 2536837 | 1724733.8 | | 3 | 3642766 | 2397330.6 | | 4 | 2389433 | 2221480.4 | | 5 | 691209 | 1543897.1 | | 6 | 72902 | 858389.0 | | 7 | 1716 | 397712.0 | | 8 | 1 | 157945.2 | | 9 | 0 | 54884.8 | | 10 | 0 | 16952.9 | |----+----------+-----------|

And here’s a plot:

I found the above theorem in these notes. It’s possible I’ve misunderstood something or there’s an error in the notes. I haven’t found a more formal reference on the theorem yet.

**Update**: According to the results in the comments below, it seems that the notes I cited may be wrong. The notes say “distinct prime factors”, i.e. ω(*n*), while numerical results suggest they meant to say Ω(*n*), the number of prime factors counted with multiplicity. I verified the numbers given below. Here’s a plot.

Here’s the Python code I used to generate the table. (To match the code for the revised graph, replace `omega`

which calculated ω(*n*) with the SymPy function `primeomega`

which calculates Ω(*n*).)

from sympy import factorint from numpy import empty, log, log2 from scipy.stats import poisson N = 10000000 def omega(n): return len(factorint(n)) table = empty(N, int) table[0] = table[1] = 0 for n in range(2, N): table[n] = omega(n) # upper bound on the largest value of omega(n) for n < N. u = int(log2(N)) for n in range(1, u+1): print(n, len(table[table==n]), poisson(log(log(N))).pmf(n-1)*N)

Perhaps a normal approximation with mean log log n and variance log log n is a better approximation. For large n, this normal and a Poisson distribution are close. This information is taken from: https://math.stackexchange.com/questions/409675/number-of-distinct-prime-factors-omegan

It was interesting so I started playing around with it in R. I decided that the distribution didn’t look much like the log(log(N)) Poisson distribution and the two means weren’t even close.

So I decided to drop the unique part of the prime factors and it feels* much better. (My math aint good enough to know if I am right).

Here is the inefficient R code

(Two simulations – one using the average of the original data and one using the log(log(N)) mean)

~~~~~~~~~~~~

library(gmp)

N<- 10000001

pfc <-matrix(0,N-1,1)

for (i in 2:N) {

# pfc[i-1]<-length(unique(factorize(i)))

pfc[i-1]<-length(factorize(i))

}

meanpfc<-sum(pfc-1)/(N-1)

ppfc<-rpois(N-1,meanpfc)+1

ppfc2<-rpois(N-1,log(log(N-1)))+1

u<-max(ppfc,pfc,ppfc2)

fppfc2<-factor(ppfc2,levels=1:u)

fppfc<-factor(ppfc,levels=1:u)

fpfc<-factor(pfc,levels=1:u)

obs<-table(fpfc)

pred<-table(fppfc)

predlln<-table(fppfc2)

cbind(obs,pred, predlln)

~~~~~~~~~~~~~~~~~~

Table of data

obs pred predlln

1 664579 617455 620119

2 1904325 1716905 1722713

3 2444359 2394090 2395522

4 2050696 2223963 2223467

5 1349779 1546252 1545203

6 774078 863598 858951

7 409849 399329 397826

8 207207 159158 157759

9 101787 55695 55203

10 49163 17139 16926

11 23448 4815 4786

12 11068 1257 1179

13 5210 266 276

14 2406 64 58

15 1124 9 11

16 510 4 1

17 233 1 0

18 102 0 0

19 45 0 0

20 21 0 0

21 7 0 0

22 3 0 0

23 1 0 0

~~~~~~~~~~

The average of the original data (minus 1) is 2.7861251. The average of the simulated Poisson distribution with the previous mean is 2.7854423 (results will vary!) and log(log(10000000)) =2.779943

It's still odd in the right hand tail but much better in the left hand tail.