Recognizing numbers

I was playing around with SymPy, a symbolic math package for Python, and ran across nsimplify. It takes a floating point number and tries to simplify it: as a fraction with a small denominator, square root of a small integer, an expression involving famous constants, etc.

For example, suppose some calculation returned 4.242640687119286 and you suspect there’s something special about that number. Here’s how you might test where it came from.

    >>> from sympy import *
    >>> nsimplify(4.242640687119286)
    3*sqrt(2)

Maybe you do a calculation numerically, find a simple expression for the result, and that suggests an analytical solution.

I think a more common application of nsimplify might be to help you remember half-forgotten formulas. For example, maybe you’re rusty on your trig identities, but you remember that cos(π/6) is something special.

    >>> nsimplify(cos(pi/6))
    sqrt(3)/2

Or to take a more advanced example, suppose that you vaguely remember that the gamma function takes on recognizable values at half integer values, but you don’t quite remember how. Maybe something involving π or e. You can suggest that nsimplify include expressions with π and e in its search.

    >>> nsimplify(gamma(3.5), constants=[pi, E])
    15*sqrt(pi)/8

You can also give nsimplify a tolerance, asking it to find a simple representation within a neighborhood of the number. For example, here’s a way to find approximations to π.

    >>> nsimplify(pi, tolerance=1e-5)
    355/113

With a wider tolerance, it will return a simpler approximation.

    >>> nsimplify(pi, tolerance=1e-2)
    22/7

Finally, here’s higher precision approximation to π that isn’t exactly simple:

    >>> nsimplify(pi, tolerance=1e-7)
    exp(141/895 + sqrt(780631)/895)

Rolling dice for normal samples: Python version

A handful of dice can make a decent normal random number generator, good enough for classroom demonstrations. I wrote about this a while ago.

My original post included Mathematica code for calculating how close to normal the distribution of the sum of the dice is. Here I’d like to redo the code in Python to show how to do the same calculations using SymPy. [Update: I’ll also give a solution that does not use SymPy and that scales much better.]

If you roll five dice and add up the spots, the probability of getting a sum of k is the coefficient of xk in the expansion of

(x + x2 + x3 + x4 + x5 + x6)5 / 65.

Here’s code to find the probabilities by expanding the polynomial and taking coefficients.

    from sympy import Symbol

    sides = 6
    dice = 5
    rolls = range( dice*sides + 1 )

    # Tell SymPy that we want to use x as a symbol, not a number
    x = Symbol('x')

    # p(x) = (x + x^2 + ... + x^m)^n
    # where m = number of sides per die
    # and n = number of dice
    p = sum( x**i for i in range(1, sides + 1) )**dice

    # Extract the coefficients of p(x) and divide by sides**dice
    pmf = [sides**(-dice) * p.expand().coeff(x, i) for i in rolls]

If you’d like to compare the CDF of the dice sum to a normal CDF you could add this.

    from scipy import array, sqrt
    from scipy.stats import norm

    cdf = array(pmf).cumsum()

    # Normal CDF for comparison
    mean = 0.5*(sides + 1)*dice
    variance = dice*(sides**2 -1)/12.0
    temp = [norm.cdf(i, mean, sqrt(variance)) for i in roles]
    norm_cdf = array(temp)

    diff = abs(cdf - norm_cdf)
    # Print the maximum error and where it occurs
    print diff.max(), diff.argmax()

Question: Now suppose you want a better approximation to a normal distribution. Would it be better to increase the number of dice or the number of sides per dice? For example, would you be better off with 10 six-sided dice or 5 twelve-sided dice? Think about it before reading the solution.

Update: The SymPy code does not scale well. When I tried the code with 50 six-sided dice, it ran out of memory. Based on Andre’s comment, I rewrote the code using polypow. SymPy offers much more symbolic calculation functionality than NumPy, but in this case NumPy contains all we need. It is much faster and it doesn’t run out of memory.

    from numpy.polynomial.polynomial import polypow
    from numpy import ones

    sides = 6
    dice = 100

    # Create an array of polynomial coefficients for
    # x + x^2 + ... + x^sides
    p = ones(sides + 1)
    p[0] = 0

    # Extract the coefficients of p(x)**dice and divide by sides**dice
    pmf = sides**(-dice) * polypow(p, dice)
    cdf = pmf.cumsum()

That solution works for up to 398 dice. What’s up with that? With 399 dice, the largest polynomial coefficient overflows. If we divide by the number of dice before raising the polynomial to the power dice, the code becomes a little simpler and scales further.

    p = ones(sides + 1)
    p[0] = 0
    p /= sides
    pmf = polypow(p, dice)
    cdf = pmf.cumsum()

I tried this last approach on 10,000 dice with no problem.

Suffix primes

MathUpdate tweeted this afternoon that

Any number made by removing the first n digits of 646216567629137 is still prime.

and links to sequence A012885 in the Online Encyclopedia of Integer Sequences (OEIS). The OEIS heading for the sequence is

Suffixes of 357686312646216567629137 (all primes)

which implies you can start with an even larger number, cutting off the first digit each time and producing a sequence of primes.

The following Python code verifies that this is indeed the case.

    from sympy.ntheory import isprime

    x = "357686312646216567629137"

    while x:
        print isprime(int(x))
        x = x[1:]

Update: lucio wrote a program to show that the prime given here is the longest one with the suffix property.

    def extend_prime(n, result):
        for i in range(10):
            nn = int(str(i) + str(n))
            if nn == n: continue
            if isprime(nn):
                result.append(nn)
                extend_prime(nn, result)
        return result        

    print "Max Prefix Prime:", max(extend_prime("", []))

One minor suggestion: by using range(1, 10) rather than range(10) above, i.e. eliminating 0, the line if nn == n: continue could be eliminated.

Instead of calling max, you could call len to find that there are 4260 suffix primes.

Here’s a list of all suffix primes created by the code above and sorting the output.

Other special primes

Narcissus prime in Python

I’ve been looking back on some of my blog posts that included Mathematica code to see whether I could rewrite them using Python. For example, I rewrote my code for finding sonnet primes in Python a few days ago. Next I wanted to try testing the Narcissus prime.

Futility closet describes the Narcissus prime as follows:

Repeat the string 1808010808 1560 times, and tack on a 1 the end. The resulting 15601-digit number is prime, and because it’s a palindrome made up of the digits 1, 8, and 0, it remains prime when read backward, upside down, or in a mirror.

My Mathematica code for verifying this claim is posted here. Here’s Python code to do the same thing:

    from sympy.ntheory import isprime
    isprime(int("1808010808"*1560 + "1"))

This does indeed return True. However, the Mathematica code ran for about 2 minutes and the SymPy code took 17.5 hours, about 500 times longer.

Update (December 29, 2019): Aaron Meurer reports in the comments that the latest version of SymPy is much faster at solving this problem.

Sonnet primes in Python

A while back I wrote about sonnet primes, primes of the form ababcdcdefefgg where the letters a through g represent digits and a is not zero. The name comes from the rhyme scheme of an English (Shakespearean) sonnet.

In the original post I gave Mathematica code to find all sonnet primes. This post shows how to do it in Python.

from sympy.ntheory import isprime
from itertools import permutations

def number(t):
    # turn a tuple into a number
    return 10100000000000*t[0] + 1010000000000*t[1] 
           +   1010000000*t[2] +     101000000*t[3] 
           +       101000*t[4] +         10100*t[5] 
           +           11*t[6]

sonnet_numbers = (number(t) for t in 
    permutations(range(10), 7) if t[0] != 0)

sonnet_primes = filter(isprime, sonnet_numbers)

Moving from Mathematica to Python

Everything I do regularly in Mathematica can be done in Python. Even though Mathematica has a mind-boggling amount of functionality, I only use a tiny proportion of it. I skimmed through some of my Mathematica files to see what functions I use and then looked for Python counterparts. I found I use less of Mathematica than I imagined.

The core mathematical functions I need are in SciPy. The plotting features are in matplotlib. The SymPy library appears to have the symbolic functionality I need, though I’m as not sure about this one.

As I’ve blogged about before, I’d like to consolidate my tools. I started using Emacs again because I was frustrated with using a different editor for every kind of file. One of the things I find promising about Python is that I may be able to do more in Python and reduce the number of programming languages I use regularly.

Update (2017):

I wrote this post years ago when I was just starting to move to the Python stack. Since that time I have used Python as my default programming environment, though I still use Mathematica as well. The number and quality of Python libraries for applied mathematics has increased greatly over that time.

Python has numerous advantages over Mathematica. It is open source, and so it is more transparent. When something goes wrong, you can dig in and debug it. It is of course free, so you don’t have to buy software licenses, saving not only money but administrative hassle. And perhaps more importantly, other people that you want to share code with don’t have to buy licenses; you might find a Mathematica license a good investment for your company, but you can’t expect everyone you work with to necessarily come to the same conclusion.

The disadvantage to Python relative to Mathematica is that it is less consistent and less integrated. The Python stack for applied math—SciPy, NumPy, Pandas, Matplotlib, etc.—is better integrated than it used to be, but it still remains a collection of separate libraries.