Python resources

Each Wednesday I post a list of some of the resources on this site. This week: Python notes.

See also blog posts tagged Python, SciPy, and SymPy and the Twitter account SciPyTip.

Last week: Special functions

Next week: Probability resources

Counting primitive bit strings

A string of bits is called primitive if it is not the repetition of several copies of a smaller string of bits. For example, the 101101 is not primitive because it can be broken down into two copies of the string 101. In Python notation, you could produce 101101 by "101"*2. The string 11001101, on the other hand, is primitive. (It contains substrings that are not primitive, but the string as a whole cannot be factored into multiple copies of a single string.)

For a given n, let’s count how many primitive bit strings there are of length n. Call this f(n). There are 2n bit strings of length n, and f(n) of these are primitive. For example, there are f(12) primitive bit strings of length 12. The strings that are not primitive are made of copies of primitive strings: two copies of a primitive string of length 6, three copies of a primitive string of length 4, etc. This says

 2^{12} = f(12) + f(6) + f(4) + f(3) + f(2) + f(1)

and in general

2^n = \sum_{d \mid n} f(d)

Here the sum is over all positive integers d that divide n.

Unfortunately this formula is backward. It gives is a formula for something well known, 2n, as a sum of things we’re trying to calculate. The Möbius inversion formula is just what we need to turn this formula around so that the new thing is on the left and sums of old things are on the right. It tells us that

f(n) = \sum_{d \mid n} \mu\left(\frac{n}{d}\right) 2^d

where μ is the Möbius function.

We could compute f(n) with Python as follows:

from sympy.ntheory import mobius, divisors

def num_primitive(n):
    return sum( [mobius(n/d)*2**d for d in divisors(n)] )

The latest version of SymPy, version 0.7.6, comes with a function mobius for computing the Möbius function. If you’re using an earlier version of SymPy, you can roll your own mobius function:

from sympy.ntheory import factorint

def mobius(n):
    exponents = factorint(n).values()
    lenexp = len(exponents)
    m = 0 if lenexp == 0 else max(exponents)
    return 0 if m > 1 else (-1)**lenexp

The version of mobius that comes with SymPy 0.7.6 may be more efficient. It could, for example, stop the factorization process early if it discovers a square factor.

Ellipsoid surface area

How much difference does the earth’s equatorial bulge make in its surface area?

To first approximation, the earth is a sphere. The next step in sophistication is to model the earth as an ellipsoid.

The surface area of an ellipsoid with semi-axes abc is

A = 2\pi \left( c^2 + \frac{ab}{\sin\phi} \left( E(\phi, k) \sin^2\phi + F(\phi, k) \cos^2 \phi\right)\right)




m = k^2 = \frac{a^2(b^2 - c^2)}{b^2(a^2 - c^2)}

The functions E and F are incomplete elliptic integrals

 F(\phi, k) = \int_0^\phi \frac{d\theta}{\sqrt{1 - k^2 \sin^2\theta}}


E(\phi, k) = \int_0^\phi \sqrt{1 - k^2 \sin^2\theta}\,d\theta

implemented in SciPy as ellipeinc and ellipkinc. Note that the SciPy functions take m as their second argument rather its square root k.

For the earth, a = b and so m = 1.

The following Python code computes the ratio of earth’s surface area as an ellipsoid to its area as a sphere.

from scipy import pi, sin, cos, arccos
from scipy.special import ellipkinc, ellipeinc

# values in meters based on GRS 80
equatorial_radius = 6378137
polar_radius = 6356752.314140347

a = b = equatorial_radius
c = polar_radius

phi = arccos(c/a)
# in general, m = (a**2 * (b**2 - c**2)) / (b**2 * (a**2 - c**2))
m = 1

temp = ellipeinc(phi, m)*sin(phi)**2 + ellipkinc(phi, m)*cos(phi)**2
ellipsoid_area = 2*pi*(c**2 + a*b*temp/sin(phi))

# sphere with radius equal to average of polar and equatorial
r = 0.5*(a+c)
sphere_area = 4*pi*r**2


This shows that the ellipsoid model leads to 0.112% more surface area relative to a sphere.

Source: See equation 19.33.2 here.

Update: It was suggested in the comments that it would be better to compare the ellipsoid area to that of a sphere of the same volume. So instead of using the average of the polar and equatorial radii, one would take the geometric mean of the polar radius and two copies of the equatorial radius. Using that radius, the ellipsoid has 0.0002% more area than the sphere.

Benchmarking C++, Python, R, etc.

The other day Travis Oliphant pointed out an interesting paper: A Comparison of Programming Languages in Economics. The paper benchmarks several programming languages on a computational problem in economics.

All the usual disclaimers about benchmarks apply, your mileage may vary, etc. See the paper for details.

Here I give my summary of their summary of their results. The authors ran separate benchmarks on Mac and Windows. The results were qualitatively the same, so I just report the Windows results here.

Times in the table below are relative to the fastest C++ run.

Language Time
C++ 1.00
Java 2.10
Julia 2.70
CPython 155.31
Python with Numba 1.57
R 505.09
R using compiler package 243.38


The most striking result is that the authors were able to run their Python code 100x faster, achieving performance comparable to C++, by using Numba.

Number theory determinant and SymPy

Let σ(n) be the sum of the positive divisors of n and let gcd(a, b) be the greatest common divisor of a and b.

Form an n by n matrix M whose (i, j) entry is σ(gcd(i, j)). Then the determinant of M is n!.

The following code shows that the theorem is true for a few values of n and shows how to do some common number theory calculations in SymPy.

from sympy import gcd, divisors, Matrix, factorial

def f(i, j):
    return sum( divisors( gcd(i, j) ) )

def test(n):
    r = range(1, n+1)
    M = Matrix( [ [f(i, j) for j in r] for i in r] )
    return M.det() - factorial(n)

for n in range(1, 11):
    print test(n)

As expected, the test function returns zeros.

If we replace the function σ above by τ where τ(n) is the number of positive divisors of n, the corresponding determinant is 1. To test this, replace sum by len in the definition of f and replace factorial(n) by 1.

In case you’re curious, both results are special cases of the following more general theorem. I don’t know whose theorem it is. I found it here.

For any arithmetic function f(m), let g(m) be defined for all positive integers m by

g(m) = \sum_{d \,\mid \,m} \mu(d) f\left(\frac{m}{d}\right)

Let M be the square matrix of order n with ij element f(gcd(i, j)). Then

\det M = \prod_i^n g(j)

Here μ is the Möbius function. The two special cases above correspond to g(m) = m and g(m) = 1.

A prime-generating formula and SymPy

Mills’ constant is a number θ such that the integer part of θ raised to a power of 3 is always a prime. We’ll see if we can verify this computationally with SymPy.

from sympy import floor, isprime
from sympy.mpmath import mp, mpf

# set working precision to 200 decimal places
mp.dps = 200

# Mills' constant
theta = mpf("1.30637788386308069046861449260260571291678458515671364436805375996643405376682659882150140370119739570729")

for i in range(1, 7):
    n = floor(theta**3**i)
    print i, n, isprime(n)

Note that the line of code defining theta wraps Mill’s constant as a string and passes it as an argument to mpf. If we just wrote theta = 1.30637788386308… then theta would be truncated to an ordinary floating point number and most of the digits would be lost. Not that I did that in the first version of the code. :)

This code says that the first five outputs are prime but that the sixth is not. That’s because we are not using Mills’ constant to enough precision. The integer part of θ^3^6 is 85 digits long, and we don’t have enough precision to compute the last digit correctly.

If we use the value of Mills’ constant given here to 640 decimal places, and increase mp.dps to 1000, we can correctly compute the sixth term, but not the seventh.

Mill’s formula is just a curiosity with no practical use that I’m aware of, except possibly as an illustration of impractical formulas.

Relating Airy and Bessel functions

The Airy functions Ai(x) and Bi(x) are independent solutions to the differential equation

y'' - xy = 0

For negative x they act something like sin(x) and cos(x). For positive x they act something like exp(x) and exp(-x). This isn’t surprising if you look at the differential equation. If you replace x with a negative constant, you get sines and cosines, and if you replace it with a positive constant, you get positive and negative exponentials.

The Airy functions can be related to Bessel functions as follows:

\mathrm{Ai}(x) = \left\{ 	\begin{array}{ll} 		\frac{1}{3}\sqrt{\phantom{-}x} \left(I_{-1/3}(\hat{x}) - I_{1/3}(\hat{x})\right) & \mbox{if } x > 0 \\<br /><br /><br /><br /><br />
                \\<br /><br /><br /><br /><br />
		\frac{1}{3}\sqrt{-x} \left(J_{-1/3}(\hat{x}) + J_{1/3}(\hat{x})\right) & \mbox{if } x < 0 	\end{array} \right.


\mathrm{Bi}(x) = \left\{ 	\begin{array}{ll} 		\sqrt{\phantom{-}x/3} \left(I_{-1/3}(\hat{x}) + I_{1/3}(\hat{x})\right) & \mbox{if } x > 0 \\<br />                 \\<br /> 		\sqrt{-x/3} \left(J_{-1/3}(\hat{x}) - J_{1/3}(\hat{x})\right) & \mbox{if } x < 0 	\end{array} \right.

Here J is a “Bessel function of the first kind” and I is a “modified Bessel function of the first kind.” Also

\hat{x} = \frac{2}{3} \left(\sqrt{|x|}\right)^3

To verify the equations above, and to show how to compute these functions in Python, here’s some code.

The SciPy function airy computes both functions, and their first derivatives, at once. I assume that’s because it doesn’t take much longer to compute all four functions than to compute one. The code for Ai2 and Bi2 below uses np.where instead of if ... else so that it can operate on NumPy vectors all at once. You can plot Ai and Ai2 and see that the two curves lie on top of each other. The same holds for Bi and Bi2.


from scipy.special import airy, jv, iv
from numpy import sqrt, where

def Ai(x):
    (ai, ai_prime, bi, bi_prime) = airy(x)
    return ai

def Bi(x):
    (ai, ai_prime, bi, bi_prime) = airy(x)
    return bi

def Ai2(x):
    third = 1.0/3.0
    hatx = 2*third*(abs(x))**1.5
    return where(x > 0,
        third*sqrt( x)*(iv(-third, hatx) - iv(third, hatx)),
        third*sqrt(-x)*(jv(-third, hatx) + jv(third, hatx)))

def Bi2(x):
    third = 1.0/3.0
    hatx = 2*third*(abs(x))**1.5
    return where(x > 0,
        sqrt( x/3.0)*(iv(-third, hatx) + iv(third, hatx)),
        sqrt(-x/3.0)*(jv(-third, hatx) - jv(third, hatx)))


There is a problem with Ai2 and Bi2: they return nan at 0. A more careful implementation would avoid this problem, but that’s not necessary since these functions are only for illustration. In practice, you’d simply use airy and it does the right thing at 0.

Related links:

SymPy book

There’s a new book on SymPy, a Python library for symbolic math.

The book is Instant SymPy Starter by Ronan Lamy. As far as I know, this is the only book just on SymPy. It’s only about 50 pages, which is nice. It’s not a replacement for the online documentation but just a quick start guide.

The online SymPy documentation is good, but I think it would be easier to start with this book. And although I’ve been using SymPy off and on for a while, I learned a few things from the book.


Need a 12-digit prime?

You may have seen the joke “Enter any 12-digit prime number to continue.” I’ve seen it floating around as the punchline in several contexts.

So what do you do if you need a 12-digit prime? Here’s how to find the smallest one using Python.

>>> from sympy import nextprime
>>> nextprime(10**11)

The function nextprime gives the smallest prime larger than its argument. (Note that the smallest 12-digit number is 1011, not 1012. Great opportunity for an off-by-one mistake.)

Optionally you can provide an addition argument to nextprime to get primes further down the list. For example, this gives the second prime larger than 1011.

>>> nextprime(10**11, 2)

What if you wanted the largest 12-digit prime rather than the smallest?

>>> from sympy import prevprime
>>> prevprime(10**12)

Finally, suppose you want to know how many 12-digit primes there are. SymPy has a function primepi that returns the number of primes less than its argument. Unfortunately, it fails for large arguments. It works for arguments as big as 2**27 but throws a memory error for 2**28.

The number of primes less than n is approximately n / log n, so there are about 32 billion primes between 1011 and 1012. According to Wolfram Alpha, the exact number of 12-digit primes is 33,489,857,205. So if you try 12-digit numbers at random, your chances are about 1 in 30 of getting a prime. If you’re clever enough to just pick odd numbers, your chances go up to 1 in 15.

Synchronizing cicadas with Python

Suppose you want to know when your great-grandmother was born. You can’t find the year recorded anywhere. But you did discover an undated letter from her father that mentions her birth and one curious detail:  the 13-year and 17-year cicadas were swarming.

You do a little research and find that the 13-year cicadas are supposed to come out next year, and that the 17-year cicadas came out last year. When was your great-grandmother born?

Since 13 and 17 are relatively prime, the 13-year and 17-year cicadas will synchronize their schedules every 13 × 17 = 221 years. Suppose your great-grandmother was born n years ago. The remainder when n is divided by 13 must be 12, and the remainder when n is divided by 17 must be 1. We have to solve the pair of congruences n = 12 mod 13 and n = 1 mod 17. The Chinese Remainder Theorem says that this pair of congruences has a solution, and the proof of the theorem suggests an algorithm for computing the solution.

The Python library SymPy has a function for solving linear congruences.

>>> from sympy.ntheory.modular import solve_congruence
>>> solve_congruence( (12, 13), (1, 17) )
(103, 221)

This says we’re 103 years into the joint cycle of the 13-year and 17-year cicadas. So your great-grandmother was born 103 years ago. (The solution 324 = 103 + 221 is also mathematically possible, but not biologically possible.)

You can use the same SymPy function to solve systems of more congruences. For example, when is the next year in which there will be summer Olympic games and the 13-year and and 17-year cicadas will swarm? Here are a couple ways to approach this. First, you could find the last time this happened, then find when it will happen next. You’d need to solve n = 1 mod 4 (since we had summer Olympics last year) and  n = 12 mod 13 and n = 1 mod 17.

>>> solve_congruence( (1, 4), (12, 13), (1, 17) )
(545, 884)

So the cicadas and the summer Olympics were in sync 545 years ago. (Well, they would have been if the modern Olympics had existed in the middle ages.) This says they’ll be in sync again in 885 – 545 = 339 years.

Here’s a more direct approach. We want to know when the summer Olympics will be 3 years ahead of where they are now in the cycle, when the 13-year cicadas will be 1 year ahead, and the 17-year cicadas will be 16 years ahead.

>>> solve_congruence( (3, 4), (1, 13), (16, 17) )
(339, 884)

By the way, you can use negative integers with the congruences, so you could have used (-1, 17) to say the 17-year cicadas will be 1 year back instead of 16 years ahead in their cycle.

Rolling dice for normal samples: Python version

A handful of dice can make a decent normal random number generator, good enough for classroom demonstrations. I wrote about this a while ago.

My original post included Mathematica code for calculating how close to normal the distribution of the sum of the dice is. Here I’d like to redo the code in Python to show how to do the same calculations using SymPy. [Update: I’ll also give a solution that does not use SymPy and that scales much better.]

If you roll five dice and add up the spots, the probability of getting a sum of k is the coefficient of xk in the expansion of

(x + x2 + x3 + x4 + x5 + x6)5 / 65.

Here’s code to find the probabilities by expanding the polynomial and taking coefficients.

from sympy import Symbol

sides = 6
dice = 5
rolls = range( dice*sides + 1 )

# Tell SymPy that we want to use x as a symbol, not a number
x = Symbol('x')

# p(x) = (x + x^2 + ... + x^m)^n
# where m = number of sides per die
# and n = number of dice
p = sum([x**i for i in range(1, sides + 1)])**dice

# Extract the coefficients of p(x) and divide by sides**dice
pmf = [sides**(-dice) * p.expand().coeff(x, i) for i in rolls]

If you’d like to compare the CDF of the dice sum to a normal CDF you could add this.

from scipy import array, sqrt
from scipy.stats import norm

cdf = array(pmf).cumsum()

# Normal CDF for comparison
mean = 0.5*(sides + 1)*dice
variance = dice*(sides**2 -1)/12.0
temp = [norm.cdf(i, mean, sqrt(variance)) for i in roles]
norm_cdf = array(temp)

diff = abs(cdf - norm_cdf)
# Print the maximum error and where it occurs
print diff.max(), diff.argmax()

Question: Now suppose you want a better approximation to a normal distribution. Would it be better to increase the number of dice or the number of sides per dice? For example, would you be better off with 10 six-sided dice or 5 twelve-sided dice? Think about it before reading the solution.

Update: The SymPy code does not scale well. When I tried the code with 50 six-sided dice, it ran out of memory. Based on Andre’s comment, I rewrote the code using polypow. SymPy offers much more symbolic calculation functionality than NumPy, but in this case NumPy contains all we need. It is much faster and it doesn’t run out of memory.

from numpy.polynomial.polynomial import polypow
from numpy import ones

sides = 6
dice = 100

# Create an array of polynomial coefficients for
# x + x^2 + ... + x^sides
p = ones(sides + 1)
p[0] = 0

# Extract the coefficients of p(x)**dice and divide by sides**dice
pmf = sides**(-dice) * polypow(p, dice)
cdf = pmf.cumsum()

That solution works for up to 398 dice. What’s up with that? With 399 dice, the largest polynomial coefficient overflows. If we divide by the number of dice before raising the polynomial to the power dice, the code becomes a little simpler and scales further.

p = ones(sides + 1)
p[0] = 0
p /= sides
pmf = polypow(p, dice)
cdf = pmf.cumsum()

I tried this last approach on 10,000 dice with no problem.

Moments of mixtures

I needed to compute the higher moments of a mixture distribution for a project I’m working on. I’m writing up the code here in case anyone else finds this useful. (And in case I’ll find it useful in the future.) I’ll include the central moments first. From there it’s easy to compute skewness and kurtosis.

Suppose X is a mixture of n random variables Xi with weights wi, non-negative numbers adding to 1. Then the jth central moment of X is given by

E[(X - \mu)^j] = \sum_{i=1}^n \sum_{k=0}^j {j \choose k} (\mu_i - \mu)^{j-k} w_i E[(X_i- \mu_i)^k]

where μi is the mean of Xi.

In my particular application, I’m interested in a mixture of normals and so the code below computes the moments for a mixture of normals. It could easily be modified for other distributions.

from scipy.misc import factorialk, comb

def mixture_central_moment(mixture, moment):

    '''Compute the higher moments of a mixture of normal rvs.
    mixture is a list of (mu, sigma, weight) triples.
    moment is the central moment to compute.'''

    mix_mean = sum( [w*m for (m, s, w) in mixture] )

    mixture_moment = 0.0
    for triple in mixture:
        mu, sigma, weight = triple
        for k in range(moment+1):
            prod = comb(moment, k) * (mu-mix_mean)**(moment-k)
            prod *= weight*normal_central_moment(sigma, k)
            mixture_moment += prod

    return mixture_moment

def normal_central_moment(sigma, moment):

    '''Central moments of a normal distribution'''

    if moment % 2 == 1:
        return 0.0
        # If Z is a std normal and n is even, E(Z^n) == (n-1)!!
        # So E (sigma Z)^n = sigma^n (n-1)!!
        return sigma**moment * factorialk(moment-1, 2)

Once we have code for central moments, it’s simple to add code for computing skewness and kurtosis.

def mixture_skew(mixture):

    variance = mixture_central_moment(mixture, 2)
    third = mixture_central_moment(mixture, 3)
    return third / variance**(1.5)

def mixture_kurtosis(mixture):

    variance = mixture_central_moment(mixture, 2)
    fourth = mixture_central_moment(mixture, 4)
    return fourth / variance**2 - 3.0

Here’s an example of how the code might be used.

# Test on a mixture of 30% Normal(-2, 1) and 70% Normal(1, 3)
mixture = [(-2, 1, 0.3), (1, 3, 0.7)]

print "Skewness = ", mixture_skew(mixture)
print "Kurtosis = ", mixture_kurtosis(mixture)

Related post: General formula for normal moments

New introduction to SciPy

The Python stack for scientific computing is more modular than say R or Mathematica. Python is a general-purpose programming language that has libraries for scientific computing. R and Mathematica are statistical and mathematical programming languages that have general-purpose features. The Python approach has its advantages — I’d rather do math in a general language than do general programming in a mathematical language — but it takes longer to learn. The components of the Python stack work well together, but someone new to Python has to discover what components they’ll need.

Several books have come out recently to help someone learn Python and the components for numerical computing. The latest is Learning SciPy for Numerical and Scientific Computing by Francisco J. Blanco-Silva.

This book covers the things you’d expect, including SciPy, NumPy, and Matplotlib. The only exception may be IPython. But no book can cover everything. And since IPython is an environment more than a library, it makes some sense to leave it out.

In addition to the usual topics, the book includes several important topics that are not as commonly covered. For example, it devotes a good amount of space to special functions and integration in a chapter on numerical analysis. I share the author’s assessment that this is “one of the most interesting chapters in the book.”

There are three chapters on more specific applications: signal processing, data mining, and computational geometry. These chapters give an introduction to their topics as well as how to carry out computations in SciPy.

The final chapter of the book is on integration with other programming languages.

Learning SciPy for Numerical and Scientific Computing covers the material you need to get going. It’s easy to read, and still fairly small: 150 pages total, about 130 pages of content. This is the right size for such a book in my opinion. There’s plenty of online documentation for more details, but it helps to have a good overview such as this book before diving into reference material.

Use typewriter font for code inside prose

There’s a useful tradition of using a typewriter font, or more generally some monospaced font, for bits of code sprinkled in prose. The practice is analogous to using italic to mark, for example, a French mot dropped into an English paragraph. In HTML, the code tag marks content as software code, which a browser typically will render in a typewriter font.

Here’s a sentence from a new article on Python at Netflix that could benefit a few code tags.

These features (and more) have led to increasingly pervasive use of Python in everything from small tools using boto to talk to AWS, to storing information with python-memcached and pycassa, managing processes with Envoy, polling restful APIs to large applications with requests, providing web interfaces with CherryPy and Bottle, and crunching data with scipy.

Here’s the same sentence with some code tags.

These features (and more) have led to increasingly pervasive use of Python in everything from small tools using boto to talk to AWS, to storing information with python-memcached and pycassa, managing processes with Envoy, polling restful APIs to large applications with requests, providing web interfaces with CherryPy and Bottle, and crunching data with scipy.

It’s especially helpful to let the reader know that packages like requests are indeed packages. It helps to clarify, for example, whether Wes McKinney has been stress testing pandas or pandas. That way we know whether to inform animal protection authorities or to download a new version of a library.

Wallpaper and phase portraits

Suppose you want to create a background image that tiles well. You’d like it to be periodic horizontally and vertically so that there are no obvious jumps when the image repeats.

Functions like sine and cosine are period along the real line. But if you want to make a two-dimensional image by extending the sine function to the complex plane, the result is not periodic along the imaginary axis but exponential.

There are functions that are periodic horizontally and vertically. If you restrict your attention to functions that are analytic except at poles, these doubly-periodic functions are elliptic functions, a class of functions with remarkable properties. See this post if you’re interested in the details. Here we’re just after pretty wallpaper. I’ll give Python code for creating the wallpaper.

Here I’ll take a particular elliptic function sn(x). This is one of the Jacobi elliptic functions, somewhat analogous to the sine function, and use its phase portrait. Phase portraits use hue to encode the phase of a complex number, the θ value when a complex number is written in polar coordinates. The brightness of the color indicates the magnitude, the r value in polar coordinates.

Here’s the plot of sn(z, 0.2). (The sn function takes a parameter m that I arbitrarily chose as 0.2.) The plot shows two periods, horizontally and vertically. I included two periods so you could more easily see how it repeats. If you wanted to use this image as wallpaper, you could use 1/4 of the image, one period in each direction, to get by with a smaller image.

phase plot of sn(z, 0.2) - 0.2

Here’s the Python code that was used to create the image.

from mpmath import cplot, ellipfun, ellipk
sn = ellipfun('sn')
m = 0.2
x = 4*ellipk(m) # horizontal period
y = 2*ellipk(1-m) # vertical period
cplot(lambda z: sn(z, m) - 0.2, [0, 2*x], [0, 2*y], points = 100000)

I subtracted 0.2 from sn just to shift the color a little. Adding a positive number shifts the color toward red. Subtracting a positive number shifts the color toward blue. You could also multiply by some constant to increase or decrease the brightness.

You could also play around with other elliptic functions, described in the mpmath documentation here. And you can find more on cplot here. For example, you could supply your own function for how phase is mapped to color. The saturated colors used by default are good for mathematical visualization, but more subtle colors could be better for aesthetics.

If you make some interesting images, leave a comment with a link to your image and a description of how you made it.