Counting primitive bit strings

A string of bits is called primitive if it is not the repetition of several copies of a smaller string of bits. For example, the 101101 is not primitive because it can be broken down into two copies of the string 101. In Python notation, you could produce 101101 by "101"*2. The string 11001101, on the other hand, is primitive. (It contains substrings that are not primitive, but the string as a whole cannot be factored into multiple copies of a single string.)

For a given n, let’s count how many primitive bit strings there are of length n. Call this f(n). There are 2n bit strings of length n, and f(n) of these are primitive. For example, there are f(12) primitive bit strings of length 12. The strings that are not primitive are made of copies of primitive strings: two copies of a primitive string of length 6, three copies of a primitive string of length 4, etc. This says

2^{12} = f(12) + f(6) + f(4) + f(3) + f(2) + f(1)

and in general

2^n = \sum_{d \mid n} f(d)

Here the sum is over all positive integers d that divide n.

Unfortunately this formula is backward. It gives is a formula for something well known, 2n, as a sum of things we’re trying to calculate. The Möbius inversion formula is just what we need to turn this formula around so that the new thing is on the left and sums of old things are on the right. It tells us that

f(n) = \sum_{d \mid n} \mu\left(\frac{n}{d}\right) 2^d

where μ is the Möbius function.

We could compute f(n) with Python as follows:

    from sympy.ntheory import mobius, divisors

    def num_primitive(n):
        return sum( mobius(n/d)*2**d for d in divisors(n) )

The latest version of SymPy, version 0.7.6, comes with a function mobius for computing the Möbius function. If you’re using an earlier version of SymPy, you can roll your own mobius function:

    from sympy.ntheory import factorint

    def mobius(n):
        exponents = factorint(n).values()
        lenexp = len(exponents)
        m = 0 if lenexp == 0 else max(exponents)
        return 0 if m > 1 else (-1)**lenexp

The version of mobius that comes with SymPy 0.7.6 may be more efficient. It could, for example, stop the factorization process early if it discovers a square factor.

Related: Applied number theory

Number theory determinant and SymPy

Let σ(n) be the sum of the positive divisors of n and let gcd(a, b) be the greatest common divisor of a and b.

Form an n by n matrix M whose (i, j) entry is σ(gcd(i, j)). Then the determinant of M is n!.

The following code shows that the theorem is true for a few values of n and shows how to do some common number theory calculations in SymPy.

from sympy import gcd, divisors, Matrix, factorial

def f(i, j):
    return sum( divisors( gcd(i, j) ) )

def test(n):
    r = range(1, n+1)
    M = Matrix( [ [f(i, j) for j in r] for i in r] )
    return M.det() - factorial(n)

for n in range(1, 11):
    print( test(n) )

As expected, the test function returns zeros.

If we replace the function σ above by τ where τ(n) is the number of positive divisors of n, the corresponding determinant is 1. To test this, replace sum by len in the definition of f and replace factorial(n) by 1.

In case you’re curious, both results are special cases of the following more general theorem. I don’t know whose theorem it is. I found it here.

For any arithmetic function f(m), let g(m) be defined for all positive integers m by

g(m) = \sum_{d \,\mid \,m} \mu(d) f\left(\frac{m}{d}\right)

Let M be the square matrix of order n with ij element f(gcd(i, j)). Then

\det M = \prod_i^n g(j)

Here μ is the Möbius function. The two special cases above correspond to g(m) = m and g(m) = 1.

A prime-generating formula and SymPy

Mills’ constant is a number θ such that the integer part of θ raised to a power of 3 is always a prime. We’ll see if we can verify this computationally with SymPy.

from sympy import floor, isprime
from sympy.mpmath import mp, mpf

# set working precision to 200 decimal places
mp.dps = 200

# Mills' constant http://oeis.org/A051021
theta = mpf("1.30637788386308069046861449260260571291678458515671364436805375996643405376682659882150140370119739570729")

for i in range(1, 7):
    n = floor(theta**3**i)
    print i, n, isprime(n)

Note that the line of code defining theta wraps Mill’s constant as a string and passes it as an argument to mpf. If we just wrote theta = 1.30637788386308… then theta would be truncated to an ordinary floating point number and most of the digits would be lost. Not that I did that in the first version of the code. :)

This code says that the first five outputs are prime but that the sixth is not. That’s because we are not using Mills’ constant to enough precision. The integer part of θ^3^6 is 85 digits long, and we don’t have enough precision to compute the last digit correctly.

If we use the value of Mills’ constant given here to 640 decimal places, and increase mp.dps to 1000, we can correctly compute the sixth term, but not the seventh.

Mill’s formula is just a curiosity with no practical use that I’m aware of, except possibly as an illustration of impractical formulas.

A magic square filled with consecutive primes

In 1988 Martin Gardner offered a $100 prize for the first person to produce a magic square filled with consecutive primes. Later that year, Harry Nelson found 22 solutions using a Cray computer.

1,480,028,201 1,480,028,129 1,480,028,183
1,480,028,153 1,480,028,171 1,480,028,189
1,480,028,159 1,480,028,213 1,480,028,141

Gardner said that the square above is “almost certainly the one with the lowest constant possible for such a square.”

It’s easy to verify that the numbers above are consecutive primes. Here’s a little Python code for the job. The function nextprime(x, i) gives the next i primes after x. We call the function with x equal to one less than the smallest entry in the square and it prints out all the entries.

    from sympy import nextprime
    for i in range(1,10):
        print( nextprime(148028128, i) )

If you’re looking for more than a challenge, verify whether Gardner’s assertion was correct that the square above uses the smallest possible set of consecutive primes.

By the way, assuming Harry Nelson’s Cray was the Y-MP model that came out in 1988, here are its specs according to Wikipedia:

The Y-MP could be equipped with two, four or eight vector processors, with two functional units each and a clock cycle time of 6 ns (167 MHz). Peak performance was thus 333 megaflops per processor. Main memory comprised 128, 256 or 512 MB of SRAM.

Related: Magic squares from a knight’s tour and a king’s tour.

Golden strings and the rabbit constant

Golden strings are analogous to Fibonacci numbers, except one uses concatenation rather than addition.

Start with s1 = “1” and s2 = “10”. Then define sn = sn-1 + sn-2 where “+” means string concatenation.

The first few golden strings are

  • “1”
  • “10”
  • “101”
  • “10110”
  • “10110101”

The length of sn is Fn+1, the n+1st Fibonacci number. Also, sn contains Fn 1’s and Fn-1 0’s. (Source: The Glorious Golden Ratio).

If we interpret the sn as the fractional part of a binary number, the sequence converges to the rabbit constant R = 0.7098034428612913…

It turns out that R is related to the golden ratio φ by

R = \sum_{i=1}^\infty 2^{-\lfloor i \phi \rfloor}

where ⌊i φ⌋ is the largest integer no greater than iφ.

Here’s a little Python code to print out the first few golden strings and an approximation to the rabbit constant.

    from mpmath import mp, fraction

    a = "1"
    b = "10"
    for i in range(10):
        b, a = b+a, b
        print(b)

    n = len(b)
    mp.dps = n
    denom = 2**n
    num = int(b, 2)

    rabbit = fraction(num, denom)
    print(rabbit)

Note that the code sets the number of decimal places, mp.dps, to the length of the string b. That’s because it takes up to n decimal places to exactly represent a rational number with denominator 2n.

Related posts

SymPy book

There’s a new book on SymPy, a Python library for symbolic math.

The book is Instant SymPy Starter by Ronan Lamy. As far as I know, this is the only book just on SymPy. It’s only about 50 pages, which is nice. It’s not a replacement for the online documentation but just a quick start guide.

The online SymPy documentation is good, but I think it would be easier to start with this book. And although I’ve been using SymPy off and on for a while, I learned a few things from the book.

Need a 12-digit prime?

You may have seen the joke “Enter any 12-digit prime number to continue.” I’ve seen it floating around as the punchline in several contexts.

So what do you do if you need a 12-digit prime? Here’s how to find the smallest one using Python.

    >>> from sympy import nextprime
    >>> nextprime(10**11)
    100000000003L

The function nextprime gives the smallest prime larger than its argument. (Note that the smallest 12-digit number is 1011, not 1012. Great opportunity for an off-by-one mistake.)

Optionally you can provide an addition argument to nextprime to get primes further down the list. For example, this gives the second prime larger than 1011.

    >>> nextprime(10**11, 2)
    100000000019L

What if you wanted the largest 12-digit prime rather than the smallest?

    >>> from sympy import prevprime
    >>> prevprime(10**12)
    999999999989L

Finally, suppose you want to know how many 12-digit primes there are. SymPy has a function primepi that returns the number of primes less than its argument. Unfortunately, it fails for large arguments. It works for arguments as big as 2**27 but throws a memory error for 2**28.

The number of primes less than n is approximately n / log n, so there are about 32 billion primes between 1011 and 1012. According to Wolfram Alpha, the exact number of 12-digit primes is 33,489,857,205. So if you try 12-digit numbers at random, your chances are about 1 in 30 of getting a prime. If you’re clever enough to just pick odd numbers, your chances go up to 1 in 15.

Recognizing numbers

I was playing around with SymPy, a symbolic math package for Python, and ran across nsimplify. It takes a floating point number and tries to simplify it: as a fraction with a small denominator, square root of a small integer, an expression involving famous constants, etc.

For example, suppose some calculation returned 4.242640687119286 and you suspect there’s something special about that number. Here’s how you might test where it came from.

    >>> from sympy import *
    >>> nsimplify(4.242640687119286)
    3*sqrt(2)

Maybe you do a calculation numerically, find a simple expression for the result, and that suggests an analytical solution.

I think a more common application of nsimplify might be to help you remember half-forgotten formulas. For example, maybe you’re rusty on your trig identities, but you remember that cos(π/6) is something special.

    >>> nsimplify(cos(pi/6))
    sqrt(3)/2

Or to take a more advanced example, suppose that you vaguely remember that the gamma function takes on recognizable values at half integer values, but you don’t quite remember how. Maybe something involving π or e. You can suggest that nsimplify include expressions with π and e in its search.

    >>> nsimplify(gamma(3.5), constants=[pi, E])
    15*sqrt(pi)/8

You can also give nsimplify a tolerance, asking it to find a simple representation within a neighborhood of the number. For example, here’s a way to find approximations to π.

    >>> nsimplify(pi, tolerance=1e-5)
    355/113

With a wider tolerance, it will return a simpler approximation.

    >>> nsimplify(pi, tolerance=1e-2)
    22/7

Finally, here’s higher precision approximation to π that isn’t exactly simple:

    >>> nsimplify(pi, tolerance=1e-7)
    exp(141/895 + sqrt(780631)/895)

Rolling dice for normal samples: Python version

A handful of dice can make a decent normal random number generator, good enough for classroom demonstrations. I wrote about this a while ago.

My original post included Mathematica code for calculating how close to normal the distribution of the sum of the dice is. Here I’d like to redo the code in Python to show how to do the same calculations using SymPy. [Update: I’ll also give a solution that does not use SymPy and that scales much better.]

If you roll five dice and add up the spots, the probability of getting a sum of k is the coefficient of xk in the expansion of

(x + x2 + x3 + x4 + x5 + x6)5 / 65.

Here’s code to find the probabilities by expanding the polynomial and taking coefficients.

    from sympy import Symbol

    sides = 6
    dice = 5
    rolls = range( dice*sides + 1 )

    # Tell SymPy that we want to use x as a symbol, not a number
    x = Symbol('x')

    # p(x) = (x + x^2 + ... + x^m)^n
    # where m = number of sides per die
    # and n = number of dice
    p = sum( x**i for i in range(1, sides + 1) )**dice

    # Extract the coefficients of p(x) and divide by sides**dice
    pmf = [sides**(-dice) * p.expand().coeff(x, i) for i in rolls]

If you’d like to compare the CDF of the dice sum to a normal CDF you could add this.

    from scipy import array, sqrt
    from scipy.stats import norm

    cdf = array(pmf).cumsum()

    # Normal CDF for comparison
    mean = 0.5*(sides + 1)*dice
    variance = dice*(sides**2 -1)/12.0
    temp = [norm.cdf(i, mean, sqrt(variance)) for i in roles]
    norm_cdf = array(temp)

    diff = abs(cdf - norm_cdf)
    # Print the maximum error and where it occurs
    print diff.max(), diff.argmax()

Question: Now suppose you want a better approximation to a normal distribution. Would it be better to increase the number of dice or the number of sides per dice? For example, would you be better off with 10 six-sided dice or 5 twelve-sided dice? Think about it before reading the solution.

Update: The SymPy code does not scale well. When I tried the code with 50 six-sided dice, it ran out of memory. Based on Andre’s comment, I rewrote the code using polypow. SymPy offers much more symbolic calculation functionality than NumPy, but in this case NumPy contains all we need. It is much faster and it doesn’t run out of memory.

    from numpy.polynomial.polynomial import polypow
    from numpy import ones

    sides = 6
    dice = 100

    # Create an array of polynomial coefficients for
    # x + x^2 + ... + x^sides
    p = ones(sides + 1)
    p[0] = 0

    # Extract the coefficients of p(x)**dice and divide by sides**dice
    pmf = sides**(-dice) * polypow(p, dice)
    cdf = pmf.cumsum()

That solution works for up to 398 dice. What’s up with that? With 399 dice, the largest polynomial coefficient overflows. If we divide by the number of dice before raising the polynomial to the power dice, the code becomes a little simpler and scales further.

    p = ones(sides + 1)
    p[0] = 0
    p /= sides
    pmf = polypow(p, dice)
    cdf = pmf.cumsum()

I tried this last approach on 10,000 dice with no problem.

Suffix primes

MathUpdate tweeted this afternoon that

Any number made by removing the first n digits of 646216567629137 is still prime.

and links to sequence A012885 in the Online Encyclopedia of Integer Sequences (OEIS). The OEIS heading for the sequence is

Suffixes of 357686312646216567629137 (all primes)

which implies you can start with an even larger number, cutting off the first digit each time and producing a sequence of primes.

The following Python code verifies that this is indeed the case.

    from sympy.ntheory import isprime

    x = "357686312646216567629137"

    while x:
        print isprime(int(x))
        x = x[1:]

Update: lucio wrote a program to show that the prime given here is the longest one with the suffix property.

    def extend_prime(n, result):
        for i in range(10):
            nn = int(str(i) + str(n))
            if nn == n: continue
            if isprime(nn):
                result.append(nn)
                extend_prime(nn, result)
        return result        

    print "Max Prefix Prime:", max(extend_prime("", []))

One minor suggestion: by using range(1, 10) rather than range(10) above, i.e. eliminating 0, the line if nn == n: continue could be eliminated.

Instead of calling max, you could call len to find that there are 4260 suffix primes.

Here’s a list of all suffix primes created by the code above and sorting the output.

Other special primes