Let ω(n) be the number of distinct prime factors of x. A theorem of Landau says that for N large, then for randomly selected positive integers less than N, ω-1 has a Poisson(log log N) distribution. This statement holds in the limit as N goes to infinity.
Apparently N has to be extremely large before the result approximately holds. I ran an experiment for N = 10,000,000 and the fit is not great.
Here’s the data:
|----+----------+-----------| | | Observed | Predicted | |----+----------+-----------| | 1 | 665134 | 620420.6 | | 2 | 2536837 | 1724733.8 | | 3 | 3642766 | 2397330.6 | | 4 | 2389433 | 2221480.4 | | 5 | 691209 | 1543897.1 | | 6 | 72902 | 858389.0 | | 7 | 1716 | 397712.0 | | 8 | 1 | 157945.2 | | 9 | 0 | 54884.8 | | 10 | 0 | 16952.9 | |----+----------+-----------|
And here’s a plot:
I found the above theorem in these notes. It’s possible I’ve misunderstood something or there’s an error in the notes. I haven’t found a more formal reference on the theorem yet.
Update: According to the results in the comments below, it seems that the notes I cited may be wrong. The notes say “distinct prime factors”, i.e. ω(n), while numerical results suggest they meant to say Ω(n), the number of prime factors counted with multiplicity. I verified the numbers given below. Here’s a plot.
Here’s the Python code I used to generate the table. (To match the code for the revised graph, replace omega
which calculated ω(n) with the SymPy function primeomega
which calculates Ω(n).)
from sympy import factorint from numpy import empty, log, log2 from scipy.stats import poisson N = 10000000 def omega(n): return len(factorint(n)) table = empty(N, int) table[0] = table[1] = 0 for n in range(2, N): table[n] = omega(n) # upper bound on the largest value of omega(n) for n < N. u = int(log2(N)) for n in range(1, u+1): print(n, len(table[table==n]), poisson(log(log(N))).pmf(n-1)*N)
Perhaps a normal approximation with mean log log n and variance log log n is a better approximation. For large n, this normal and a Poisson distribution are close. This information is taken from: https://math.stackexchange.com/questions/409675/number-of-distinct-prime-factors-omegan
It was interesting so I started playing around with it in R. I decided that the distribution didn’t look much like the log(log(N)) Poisson distribution and the two means weren’t even close.
So I decided to drop the unique part of the prime factors and it feels* much better. (My math aint good enough to know if I am right).
Here is the inefficient R code
(Two simulations – one using the average of the original data and one using the log(log(N)) mean)
~~~~~~~~~~~~
library(gmp)
N<- 10000001
pfc <-matrix(0,N-1,1)
for (i in 2:N) {
# pfc[i-1]<-length(unique(factorize(i)))
pfc[i-1]<-length(factorize(i))
}
meanpfc<-sum(pfc-1)/(N-1)
ppfc<-rpois(N-1,meanpfc)+1
ppfc2<-rpois(N-1,log(log(N-1)))+1
u<-max(ppfc,pfc,ppfc2)
fppfc2<-factor(ppfc2,levels=1:u)
fppfc<-factor(ppfc,levels=1:u)
fpfc<-factor(pfc,levels=1:u)
obs<-table(fpfc)
pred<-table(fppfc)
predlln<-table(fppfc2)
cbind(obs,pred, predlln)
~~~~~~~~~~~~~~~~~~
Table of data
obs pred predlln
1 664579 617455 620119
2 1904325 1716905 1722713
3 2444359 2394090 2395522
4 2050696 2223963 2223467
5 1349779 1546252 1545203
6 774078 863598 858951
7 409849 399329 397826
8 207207 159158 157759
9 101787 55695 55203
10 49163 17139 16926
11 23448 4815 4786
12 11068 1257 1179
13 5210 266 276
14 2406 64 58
15 1124 9 11
16 510 4 1
17 233 1 0
18 102 0 0
19 45 0 0
20 21 0 0
21 7 0 0
22 3 0 0
23 1 0 0
~~~~~~~~~~
The average of the original data (minus 1) is 2.7861251. The average of the simulated Poisson distribution with the previous mean is 2.7854423 (results will vary!) and log(log(10000000)) =2.779943
It's still odd in the right hand tail but much better in the left hand tail.