Python / Emacs setup

When I got a new computer a few days ago, I installed the latest version of Emacs, 24.2, and broke my Python environment. I decided to re-evaluate my environment and start over. I asked a question on the Python Google+ group and here’s a summary of the Emacs packages recommended.

  • python-mode
  • rainbow-mode, rainbow-delimiters-mode
  • flymake-mode (hooked up to flymake-pyflakes)
  • linum-on
  • jedi
  • pycheckers + pyflakes
  • rope
  • electric-pair
  • show-paren
  • auto-complete
  • yassnippets

What recommendations do you have for packages? Links to relevant articles?

It might be better to leave your comments on Google+ rather than here so the answers will be in one place.

Narcissus prime in Python

I’ve been looking back on some of my blog posts that included Mathematica code to see whether I could rewrite them using Python. For example, I rewrote my code for finding sonnet primes in Python a few days ago. Next I wanted to try testing the Narcissus prime.

Futility closet describes the Narcissus prime as follows:

Repeat the string 1808010808 1560 times, and tack on a 1 the end. The resulting 15601-digit number is prime, and because it’s a palindrome made up of the digits 1, 8, and 0, it remains prime when read backward, upside down, or in a mirror.

My Mathematica code for verifying this claim is posted here. Here’s Python code to do the same thing:

    from sympy.ntheory import isprime
    isprime(int("1808010808"*1560 + "1"))

This does indeed return True. However, the Mathematica code ran for about 2 minutes and the SymPy code took 17.5 hours, about 500 times longer.

Sonnet primes in Python

A while back I wrote about sonnet primes, primes of the form ababcdcdefefgg where the letters a through g represent digits and a is not zero. The name comes from the rhyme scheme of an English (Shakespearean) sonnet.

In the original post I gave Mathematica code to find all sonnet primes. This post shows how to do it in Python.

from sympy.ntheory import isprime
from itertools import permutations

def number(t):
    # turn a tuple into a number
    return 10100000000000*t[0] + 1010000000000*t[1] 
           +   1010000000*t[2] +     101000000*t[3] 
           +       101000*t[4] +         10100*t[5] 
           +           11*t[6]

sonnet_numbers = (number(t) for t in 
    permutations(range(10), 7) if t[0] != 0)

sonnet_primes = filter(isprime, sonnet_numbers)

Finding 2013 in pi

My youngest daughter asked me this morning whether you can find the number 2013 in the digits of pi. I said it must be possible, then wrote the following Python code to find where 2013 first appears.

from mpmath import mp
mp.dps = 100000
digits = str(mp.pi)[2:]

I use the multi-precision math package mpmpath to get pi to 100,000 significant figures. I save this as a string and cut off the “3.” at the beginning to have just the string of digits after the decimal point.

The code returns 6275, so “2013” appears in the 6275th position in the string of digits. However, we usually count decimal places starting from 1 but count positions in a string starting from 0, so 2013 starts in the 6276th decimal place of pi.

So π = 3.14159…2013… where the first “…” represents 6,270 digits not shown.

* * *

Now we jump off into deeper mathematical water.

For some purposes, the digits of pi are random. The digits are obviously not random — there are algorithms for calculating them — and yet they behave randomly, and random is as random does.

If the digits of pi were random, then we could almost certainly find any sequence we want if we look long enough. Can we find any finite sequence of digits in the decimal expansion of pi? I would assume so, but I don’t know whether that has been proven.

You might expect that not only can you find 2013 in pi, but that if you split the digits of pi into blocks of 4, then 2013 and every other particular block would occur with the same frequency in the limit. That is, one would expect that the expansion of pi is uniform in base 10,000.

More generally, you might conjecture that pi is a normal number, i.e. that the digits of pi are uniformly distributed in every base. This has not been proven. In fact, no one has proved that any particular number is normal [reference]. However, we do know that almost all numbers are normal. That is, the set of non-normal numbers has Lebesgue measure zero.

Basics of Sweave and Pweave

Sweave is a tool for embedding R code in a LaTeX file. Pweave is an analogous tool for Python. By putting your code in your document rather than the results of running your code somewhere else, results are automatically recomputed when inputs change. This is especially useful with graphs: rather than including an image into your document, you include the code to create the image.

To use either Sweave or Pweave, you create a LaTeX file and include source code inside. A code block begins with <<>>= and ends with @ on a line by itself. By default, code blocks appear in the LaTeX output. You can start a code block with <<echo=FALSE>>= to execute code without echoing its source. In Pweave you can also use <% and %> to mark a code block that executes but does not echo. You might want to do this at the top of a file, for example, for import statements.

Sweave echos code like the R command line, with > for the command prompt. Pweave does not display the Python >>> command line prompt by default, though it will if you use the option term=TRUE in the start of your code block.

In Sweave, you can use Sexpr to inline a little bit of R code. For example, $x = Sexpr{sqrt(2)}$ will produce x = 1.414…. You can also use Sexpr to reference variables defined in previous code blocks. The Pweave analog uses <%= and %>. The previous example would be $x = <%= sqrt(2) %>$.

You can include a figure in Sweave or Pweave by beginning a code block with <<fig=TRUE, echo=FALSE>>= or with echo=TRUE if you want to display the code that produces the figure. With Sweave you don’t need to do anything else with your file. With Pweave you need to add usepackage{graphicx} at the top.

To process an Sweave file foo.Rnw, run Sweave("foo.Rnw") from the R command prompt. To process a Pweave file foo.Pnw, run Pweave -f tex foo.Pnw from the shell. Either way you get a LaTeX file that you can then compile to a PDF.

Here are sample Sweave and Pweave files. First Sweave:


Invisible code that sets the value of the variable $a$.

a <- 3.14

Visible code that sets $b$ and squares it.

<<bear, echo=TRUE>>=
b <- 3.15

Calling R inline: $\sqrt{2} = Sexpr{sqrt(2)}$

Recalling the variable $a$ set above: $a = Sexpr{a}$.

Here's a figure:

<<fig=TRUE, echo=FALSE>>=
x <- seq(0, 6*pi, length=200)
plot(x, sin(x))


And now Pweave:


import matplotlib.pyplot as plt
from numpy import pi, linspace, sqrt, sin

Invisible code that sets the value of the variable $a$.

a = 3.14

Visible code that sets $b$ and squares it.

b = 3.15
print b*b

Calling Python inline: $\sqrt{2} = <%= sqrt(2) %>$

Recalling the variable $a$ set above: $a = <%= a %>$.

Here's a figure:

<<fig=TRUE, echo=FALSE>>=
x = linspace(0, 6*pi, 200)
plt.plot(x, sin(x))


Related links:

Three new Python books

This post reviews three Python books that have come out recently:

SciPy and NumPy (ISBN 1449305466) by Eli Bressert is the smallest book I’ve seen from O’Reilly, aside from books in their pocket guide series. The SciPy and NumPy libraries are huge, and it can be hard to know where to start. This book gives a good, brisk overview.  In addition to SciPy and NumPy, the it also gives a brief introduction to SciKit, in particular scikit-learn for machine learning and scikit-image for image processing.

(Eli told me that he is working on supplementary material for the book. Everyone who bought the book electronically will automatically receive the new material when it is available.)

Python for Kids (ISBN 1593274076) by Jason R. Briggs is an introduction to programming aimed at kids. It starts with with an introduction to Python and moves to developing a simple game. It seems to me that kids would find the book interesting. It’s about seven times longer than the SciPy and NumPy book. It moves at a slow pace, has many illustrations, and has a casual tone.

NumPy Cookbook by Ival Idris contains around 70 small recipes, about three pages each. Many of these are about NumPy itself, but the book covers much more than its title would imply. Out of 10 chapters, four are strictly about NumPy. The first chapter of the book is about IPython. Another chapter is about “connecting NumPy with the rest of the world,” i.e. interfacing with Java, R, Matlab, and cloud services. Two chapters are devoted to profiling, debugging, and optimizing performance. There is a chapter on quality assurance (static analysis, unit testing, and documentation). And the final chapter is about Scikits and Pandas.

Winston Churchill, Bessie Braddock, and Python

Last night I was talking with someone about the pros and cons of various programming languages and frameworks for data analysis. One of the pros of Python is its elegance. The primary con is that it can be slow.

The conversation reminded me of an apocryphal exchange between Winston Churchill and Bessie Braddock.

Braddock: Winston, you are drunk.

Churchill: Yes I am. And you, Bessie, are ugly. But I shall be sober in the morning, and you will still be ugly.

Python can be slow, though there are ways to improve its performance. But ugly code is just ugly, and there’s nothing you can do about it.

Digits in powers of 2

Does the base 10 expansion of 2^n always contain the digit 7 if n is large enough?

As of 1994, this was an open question (page 196 here). I don’t know whether this has since been resolved.

The following Python code suggests that the conjecture may be true for n ≥ 72.

def digits(n):
    s = set()
    while n > 0:
        n /= 10
    return s

for i in range(71, 10000):
    p = 2**i
    if 7 not in digits(p):
        print i, p

Update: It appears that 2^n contains every digit for n > 168. See this comment.

Related post: Open question turned into exercise

Higher moments of normal distribution

Sometimes a little bit of Python beats a Google search.

Last week I needed to look up the moments of a normal distribution. The first two moments are common knowledge, the next two are easy to find, but I wasn’t able to find the higher moments.

Here is a little Sage code that produces a table of moments for the normal distribution. (Sage is a Python-based mathematical computing environment.) The code computes the expected value of Xn by taking the nth derivative of the moment generating function and setting its argument to zero.

var('m, s, t')
mgf(t) = exp(m*t + t^2*s^2/2)
for i in range(1, 11):
derivative(mgf, t, i).subs(t=0)

Here’s the output.

m^2 + s^2
m^3 + 3*m*s^2
m^4 + 6*m^2*s^2 + 3*s^4
m^5 + 10*m^3*s^2 + 15*m*s^4
m^6 + 15*m^4*s^2 + 45*m^2*s^4 + 15*s^6
m^7 + 21*m^5*s^2 + 105*m^3*s^4 + 105*m*s^6
m^8 + 28*m^6*s^2 + 210*m^4*s^4 + 420*m^2*s^6 + 105*s^8
m^9 + 36*m^7*s^2 + 378*m^5*s^4 + 1260*m^3*s^6 + 945*m*s^8
m^10 + 45*m^8*s^2 + 630*m^6*s^4 + 3150*m^4*s^6 + 4725*m^2*s^8 + 945*s^10

Update: Here is a formula for the higher moments of the normal.

Related post: Sage Beginner’s Guide

Python for data analysis

I recommend using Python for data analysis, and I recommend Wes McKinney’s book Python for Data Analysis.

I prefer Python to R for mathematical computing because mathematical computing doesn’t exist in a vacuum; there’s always other stuff to do. I find doing mathematical programming in a general-purpose language is easier than doing general-purpose programming in a mathematical language. Also, general-purpose languages like Python have larger user bases, are better designed, have better tool support, etc.

Python per se doesn’t have everything you need for mathematical computing. You need to combine several tools and libraries, typically at least SciPy, matplotlib, and IPython. Because there are different pieces involved, it’s hard to find one source to explain using them all together. Also, even with the three additional components mentioned before, there is a need for additional software for working with structured data.

Wes McKinney developed the pandas library to give Python “rich data structures and functions designed to make working with structured data fast, easy, and expressive.” And now he has addressed the need for unified exposition by writing a single book that describes how to use the Python mathematical computing stack. Importantly, the book covers two recent developments that make Python more competitive with other environments for data analysis: enhancements to IPython and Wes’ own pandas project.

Python for Data Analysis is available for pre-order. I don’t know when the book will be available but Amazon lists the publication date as October 29. My review copy was a PDF, but at least one paper copy has been spotted in the wild:

Wes McKinney holding his book at O’Reilly’s Strata Conference. Photo posted on Twitter yesterday.

Ramanujan’s factorial approximation

Ramanujan came up with an approximation for factorial that resembles Stirling’s famous approximation but is much more accurate.

n! sim sqrt{pi} left(frac{n}{e}right)^n sqrt[6]{8n^3 + 4n^2 + n + frac{1}{30}}

As with Stirling’s approximation, the relative error in Ramanujan’s approximation decreases as n gets larger. Typically these approximations are not useful for small values of n. For n = 5, Stirling’s approximation gives 118.02 while the exact value is 120. But Ramanujan’s approximation gives 120.00015.

Here’s an implementation of the approximation in Python.

def ramanujan(x):
    fact = sqrt(pi)*(x/e)**x
    fact *= (((8*x + 4)*x + 1)*x + 1/30.)**(1./6.)
    return fact

For non-integer values of x, the function returns an approximation for Γ(x+1), an extension of factorial to real values. Here’s a plot of the accuracy of Ramanujan’s approximation.

plot of precision of Ramanujan's approximation

For x = 50, Ramanujan’s approximation is good to nearly 10 significant figures, whereas Stirling’s approximation is good to about 7.

Here’s a little trickier implementation.

def ramanujan2(x):
    fact = sqrt(pi)*(x/e)**x
    fact *= (((8*x + 4)*x + 1)*x + 1/30.)**(1./6.)
    if isinstance(x, int):
        fact = int(fact)
    return fact

This code gives the same value as before if x is not an integer. But it gives exact values of factorial for x = 0, 1, 2, … 10. If x is 11 or greater, the result is not exact, but the relative error is less than 10^-7 and decreases as x increases. Ramanujan’s approximation always errs on the high side, so rounding the result down improves the accuracy and makes it exact for small integer inputs.

The downside of this trick is that now, for example, ramanujan2(5) and ramanujan2(5.0) give different results. In some contexts, the improved accuracy may not be worth the inconsistency.

Reference: On Gosper’s formula for the Gamma function

Related post: A Ramanujan series for calculating pi

Python book recommendations

People sometimes ask me to recommend a book for learning to program or a book on Python. If you want both in one book, i.e. to learn Python as a first programming language, take a look at Allen Downey’s new book Think Python.

If you’re an experience programmer and just want a book on the specifics of Python, I’d recommend David Beazley’s Python Essential Reference.

These two books are at opposite ends of the scale. Downey’s book is a very gentle introduction to programming. He takes the time to explain things that authors often don’t realize need to be explained. Beazley’s book is all about Python per se, not programming. The title says “reference,” and it does make a good reference, but you could sit down and read through the first half of the book. The second half of the book really is more of a reference.

Computing log gamma differences

Statistical computing often involves working with ratios of factorials. These factorials are often too big to fit in a floating point number, and so we work with logarithms. So if we need to compute log(a! / b!), we call software that calculates log(a!) and log(b!) directly without computing a! or  b! first. More on that here. But sometimes this doesn’t work.

Suppose a = 10^40 and b = a – 10^10. Our first problem is that we cannot even compute b directly. Since ab is 30 orders of magnitude smaller than a, we’d need about 100 bits of precision to begin to tell a and b apart, about twice as much as a standard floating point number. (Why 100? That’s the log base 2 of 10^30. And how much precision does a floating point number have? See here.)

Our next problem is that even if we could accurately compute b, the log gamma function is going to be approximately the same for a and b, and so the difference will be highly inaccurate.

So what do we do? We could use some sort of extended precision package, but that is not necessary. There’s an elegant solution using ordinary precision.

Let f(x) = log(Γ(x)). The mean value theorem says that

f(a+1) – f(b+1) = (ab) f‘(c+1)

for some c between a+1 and b+1. We don’t need to compute b, only ab, which we know is 10^10. The derivative of the log of the gamma function is called the digamma function, written ψ(x). So

log(a! / b!) = 10^10 ψ(c+1).

But what is c? The mean value theorem only says that the equation above is true for some c. What c do we use? It doesn’t matter. Since a and b are so relatively close, the digamma function will take on essentially the same value for any value between a+1 and b+1. Therefore

log(a! / b!) ≈ 10^10 ψ(a).

We could compute this in two lines of Python:

from scipy.special import digamma
print 1e10*digamma(1e40)

Related posts:

Shell != REPL

A shell is not the same as a REPL (Read Evaluate Print Loop). They look similar, but they have deep differences.

Shells are designed for one-line commands, and they’re a little awkward when used as programming languages.

Scripting languages are designed for files of commands, and they’re a little awkward to use from a REPL.

IPython is an interesting hybrid. You could think of it as a Python REPL with shell-like features added. Eshell is another interesting compromise, a shell implemented in Emacs Lisp that also works as a Lisp REPL. These hybrids are evidence that as much as people like their programming languages, they appreciate additions to a pure language REPL.

Root-finding with noisy functions

Suppose you have function g(x) and you want to find x so that g(x) = 0. However, you don’t have direct access to g(x). Instead you can evaluate f(x) which is g(x) plus random noise. Typically f(x) would be an approximation of g(x) based on Monte Carlo simulation. I have spent countless hours solving problems like this.

One difficulty is that the value of f(x) will change every time you compute it. You might hunt down x and be sure you have it braketed between two values, only to find out you were wrong when you compute f(x) more accurately, say with more Monte Carlo steps.

Assume g(x), i.e. f(x) without noise, is monotone increasing. Assume also that f(x) is expensive to compute. Say each evaluation takes 20 minutes. The amplitude of the noise may vary with x.

Here’s a simple approach to trying to find where f(x) is zero. Use a bisection method, but stop when it looks like you’ve come as close as you can due to noise. If the monotonicity assumption is violated, use that as a signal that you’re within the resolution of the noise.

# Assume a < b, fa = f(a) < 0, fb = f(b) > 0.
def noisy_bisect(f, a, b, fa, fb, tolerance):
    if b-a < tolerance:
        return (a, b)
    mid = 0.5*(a+b)
    fmid = f(mid)
    if fmid < fa or fmid > fb:
        # Monotonicity violated. Reached resolution of noise.
        return (a, b)
    if fmid < 0:
        a, fa = mid, fmid
        b, fb = mid, fmid
    return noisy_bisect(f, a, b, fa, fb, tolerance)

The values of f(a) and f(b) are saved to avoid recalculation since they’re expensive to obtain.

You could experiment with this function as follows.

from scipy.stats import norm
from time import sleep

# Gaussian distribution w/ mean 0 and standard deviation 0.01
noise = norm(0, 0.01)

def f(x):
    sleep(10) # wait 10 seconds
    return x**2 - 1 + noise.rvs()

a = 0
b = 3
fa = f(a)
fb = f(b)
print noisy_bisect(f, a, b, fa, fb, 0.0001)

Without noise, the code would iterate until the zero is bracketed in an interval of width 0.0001. However, since the noise in on the order of 0.01, the method will typically stop when the bracket is a on the order of 0.01 wide, maybe a little smaller.

We could make the root-finding software a little faster by using a secant method rather than a bisection method. Instead of taking the midpoint, the secant method draws a line between (a, f(a)) and (b, f(b)) and uses the point where the line crosses the x-axis as the next guess.

A more sophisticated approach would be to increase the accuracy of f(x) over time. If f(x) is evaluated using N steps of a Monte Carlo simulation, the standard deviation of the noise is roughly proportional to 1/√N. So, for example, cutting the noise in half requires multiplying N by 4. You could get a rough bracket on the zero using a small value of N, then increase N as the bracket gets smaller. You could play with this by using a function like the following.

def f(x, N):
    sleep(N) # wait N seconds
    noise = norm(0, 0.1/sqrt(N))
    return x**2 - 1 + noise.rvs()

You can make the noise as small as you’d like, but at a price. And the price goes up faster than the noise goes down, so it pays to not reduce the noise more than necessary.