Shell != REPL

A shell is not the same as a REPL (Read Evaluate Print Loop). They look similar, but they have deep differences.

Shells are designed for one-line commands, and they’re a little awkward when used as programming languages.

Scripting languages are designed for files of commands, and they’re a little awkward to use from a REPL.

IPython is an interesting hybrid. You could think of it as a Python REPL with shell-like features added. Eshell is another interesting compromise, a shell implemented in Emacs Lisp that also works as a Lisp REPL. These hybrids are evidence that as much as people like their programming languages, they appreciate additions to a pure language REPL.

Root-finding with noisy functions

Suppose you have function g(x) and you want to find x so that g(x) = 0. However, you don’t have direct access to g(x). Instead you can evaluate f(x) which is g(x) plus random noise. Typically f(x) would be an approximation of g(x) based on Monte Carlo simulation. I have spent countless hours solving problems like this.

One difficulty is that the value of f(x) will change every time you compute it. You might hunt down x and be sure you have it braketed between two values, only to find out you were wrong when you compute f(x) more accurately, say with more Monte Carlo steps.

Assume g(x), i.e. f(x) without noise, is monotone increasing. Assume also that f(x) is expensive to compute. Say each evaluation takes 20 minutes. The amplitude of the noise may vary with x.

Here’s a simple approach to trying to find where f(x) is zero. Use a bisection method, but stop when it looks like you’ve come as close as you can due to noise. If the monotonicity assumption is violated, use that as a signal that you’re within the resolution of the noise.

# Assume a < b, fa = f(a) < 0, fb = f(b) > 0.
def noisy_bisect(f, a, b, fa, fb, tolerance):
    if b-a < tolerance:
        return (a, b)
    mid = 0.5*(a+b)
    fmid = f(mid)
    if fmid < fa or fmid > fb:
        # Monotonicity violated. Reached resolution of noise.
        return (a, b)
    if fmid < 0:
        a, fa = mid, fmid
    else:
        b, fb = mid, fmid
    return noisy_bisect(f, a, b, fa, fb, tolerance)

The values of f(a) and f(b) are saved to avoid recalculation since they’re expensive to obtain.

You could experiment with this function as follows.

from scipy.stats import norm
from time import sleep

# Gaussian distribution w/ mean 0 and standard deviation 0.01
noise = norm(0, 0.01)

def f(x):
    sleep(10) # wait 10 seconds
    return x**2 - 1 + noise.rvs()

a = 0
b = 3
fa = f(a)
fb = f(b)
print noisy_bisect(f, a, b, fa, fb, 0.0001)

Without noise, the code would iterate until the zero is bracketed in an interval of width 0.0001. However, since the noise in on the order of 0.01, the method will typically stop when the bracket is a on the order of 0.01 wide, maybe a little smaller.

We could make the root-finding software a little faster by using a secant method rather than a bisection method. Instead of taking the midpoint, the secant method draws a line between (a, f(a)) and (b, f(b)) and uses the point where the line crosses the x-axis as the next guess.

A more sophisticated approach would be to increase the accuracy of f(x) over time. If f(x) is evaluated using N steps of a Monte Carlo simulation, the standard deviation of the noise is roughly proportional to 1/√N. So, for example, cutting the noise in half requires multiplying N by 4. You could get a rough bracket on the zero using a small value of N, then increase N as the bracket gets smaller. You could play with this by using a function like the following.

def f(x, N):
    sleep(N) # wait N seconds
    noise = norm(0, 0.1/sqrt(N))
    return x**2 - 1 + noise.rvs()

You can make the noise as small as you’d like, but at a price. And the price goes up faster than the noise goes down, so it pays to not reduce the noise more than necessary.

Calculating pi with AGM and mpmath

This post gives an algorithm based on the arithmetic-geometric mean that rapidly converges to pi. I’ll use it to illustrate multiple precision arithmetic using Python’s mpmath module.

Given two non-negative numbers a and b, their arithmetic mean is (a + b)/2 and their geometric mean is √(ab). Suppose you start with two non-negative numbers and take their arithmetic mean and geometric mean. Then take the arithmetic and geometric mean of the results. Keep doing this repeatedly, and the result converges to what is called the arithmetic-geometric mean (AGM).

There is a formula for calculating pi based on the AGM that goes back to Gauss.

pi = frac{ 4M^2left(1, frac{1}{sqrt{2}}right)}{1 - sum_{n=1}^infty 2^{n+1} (a_n^2 - b_n^2)}

Here M(a, b) is the AGM of a and b, and an and bn are the values after n steps of the iteration.

The process defining the AGM converges quickly, and so the formula is practical for computing pi. I found it in a paper from 1988, and at that time the formula had been used to compute pi to over 29 million decimal places. In the code below, we compute pi to 997 decimal places in only 10 iterations.

from mpmath import mp, sqrt, mpf, pi, log10, fabs

# carry out calculations to 1000 decimal places
decimal_places = 1000
mp.dps = decimal_places
epsilon = 1/mpf(10**decimal_places)

# set a = 1 and b = 1/sqrt(2) as multi-precision numbers
a = mpf(1)
b = 1/sqrt(mpf(2))

diff = a - b
series = mpf(0)

n = 0
while diff > epsilon:
    n += 1
    arith = (a + b)/2
    geom = sqrt(a*b)
    a, b = arith, geom
    series += 2**(n+1) * (a*a - b*b)
    diff = a - b

# a and b have converged to the AGM
my_pi = 4*a*a/(1 - series)

error = fabs(pi - my_pi)
decimal_places = int(-log10(error))

print "Number of steps used: %d" % n
print "Number of correct decimal places: %d" % decimal_places

The code can be used to calculate pi out further. The number of correct decimal places roughly doubles with each iteration. For example, computing pi to 10,000 places takes only 3 more iterations.

Related posts:

Algorithm for world record pi calculations
Ramanujan series for pi

Using SciPy with IronPython

Three years ago I wrote a post about my disappointment using SciPy with IronPython. A lot has changed since then, so I thought I’d write a short follow-up post.

To install NumPy and SciPy for use with IronPython, follow the instructions here. After installation, NumPy works as expected.

There is one small gotcha with SciPy. To use SciPy with IronPython, start ipy with the command line argument -X:Frames. Then you can use SciPy as you would from CPython. For example.

c:> ipy -X:Frames
>>> import scipy as sp
>>> sp.pi
3.141592653589793

Without the -X:Frames option you’ll get an error when you try to import scipy.

AttributeError: 'module' object has no attribute '_getframe'

According to this page,

The issue is that SciPy makes use of the CPython API for inspecting the current stack frame which IronPython doesn’t enable by default because of a small runtime performance hit. You can turn on this functionality by passing the command line argument “-X:Frames” to on the command line.

Machine Learning in Action

A couple months ago I briefly reviewed Machine Learning for Hackers by Drew Conway and John Myles White. Today I’m looking at Machine Learning in Action by Peter Harrington and comparing the two books.

Both books are about the same size and cover many of the same topics. One difference between the two books is choice of programming language: ML for Hackers uses R for its examples, ML in Action uses Python.

ML in Action doesn’t lean heavily on Python libraries. It mostly implements its algorithms from scratch, with a little help from NumPy for linear algebra, but it does not use ML libraries such as scikit-learn. It sometimes uses Matplotlib for plotting and uses Tkinter for building a simple GUI in one chapter. The final chapter introduces Hadoop and Amazon Web Services.

ML for Hackers is a little more of a general introduction to machine learning. ML in Action contains a brief introduction to machine learning in general, but quickly moves on to specific algorithms. ML for Hackers spends a good number of pages discussing data cleaning. ML in Action starts with clean data in order to spend more time on algorithms.

ML in Action takes 8 of the top 10 algorithms in machine learning (as selected by this paper) and organizes around these algorithms. (The two algorithms out of the top 1o that didn’t make it into ML in Action were PageRank, because it has been covered well elsewhere, and EM, because its explanation requires too much mathematics.) The algorithms come first in ML in Action, illustrations second. ML for Hackers puts more emphasis on its examples and reads a bit more like a story. ML in Action reads a little more like a reference book.

http://www.johndcook.com/blog/2008/06/27/wine-beer-and-statistics/#comment-170809

Solutions to knight's random walk

My previous post asked this question:

Start a knight at a corner square of an otherwise-empty chessboard. Move the knight at random by choosing uniformly from the legal knight-moves at each step. What is the mean number of moves until the knight returns to the starting square?

There is a mathematical solution that is a little arcane, but short and exact. You could also approach the problem using simulation, which is more accessible but not exact.

The mathematical solution is to view the problem as a random walk on a graph. The vertices of the graph are the squares of a chess board and the edges connect legal knight moves. The general solution for the time to first return is simply 2N/k where N is the number of edges in the graph, and k is the number of edges meeting at the starting point. Amazingly, the solution hardly depends on the structure of the graph at all. It only requires that the graph is connected. In our case N = 168 and k = 2.

For a full explanation of the math, see this online book, chapter 3, page 9. Start there and work your way backward until you understand the solution.

And now for simulation. The problem says to pick a legal knight’s move at random. The most direct approach would be to find the legal moves at a given point first, then choose one of those at random. The code below achieves the same end with a different approach. It first chooses a random move, and if that move is illegal (i.e. off the board) it throws that move away and tries again.  This will select a legal move with the right probability, though perhaps that’s not obvious. It’s what’s known as an accept-reject random generator.

from random import randint

# Move a knight from (x, y) to a random new position
def new_position(x, y):

    while True:
        dx, dy = 1, 2

        # it takes three bits to determine a random knight move:
        # (1, 2) vs (2, 1), and the sign of each
        r = randint(0, 7)
        if r % 2:
            dx, dy = dy, dx
        if (r >> 1) % 2:
            dx = -dx
        if (r >> 2) % 2:
            dy = -dy

        newx, newy = x + dx, y + dy
        # If the new position is on the board, take it.
        # Otherwise try again.
        if (newx >= 0 and newx < 8 and newy >= 0 and newy < 8):
            return (newx, newy)

# Count the number of steps in one random tour
def random_tour():
    x, y = x0, y0 = 0, 0
    count = 0
    while True:
        x, y = new_position(x, y)
        count += 1
        if x == x0 and y == y0:
            return count

# Average the length of many random tours
sum = 0
num_reps = 100000
for i in xrange(num_reps):
    sum += random_tour()
print sum / float(num_reps)

A theorem is better than a simulation, but a simulation is a lot better than nothing. This problem illustrates how sometimes we think we need to simulate when we don’t. On the other hand, when you have a simulation and a theorem, you have more confidence in your solution because each validates the other.

Python as a Lisp dialect

From Peter Norvig:

Basically, Python can be seen as a dialect of Lisp with “traditional” syntax … Python supports all of Lisp’s essential features except macros, and you don’t miss macros all that much because it does have eval, and operator overloading, and regular expression parsing, so some — but not all — of the use cases for macros are covered.

Source: Python for Lisp Programmers

Math languages vs. application languages

Last Friday I posted on @SciPyTip a summary of why I like SciPy, the scientific programming library for Python.

I’d rather do math in a general-purpose language than try to do general-purpose programming in a math language.

Mathematical software is never purely mathematical. The math has to connect to something, and that’s where most of the programming effort may go.

If you’re responsible for the math and for the larger system it’s embedded in, you have three options.

  1. Write the math in an application programming language.
  2. Do application programming in a mathematical language.
  3. Use different languages for the math and the larger system.

My preferred option is #1. My second choice is #3. My last choice, by far, is #2.

Application languages are typically better for math than mathematical languages are for applications. Application languages also have a larger user base, and hence better documentation and tools.

Mixing two languages works well in a team that has someone specialized in the math language, someone specialized in the application language, and someone fluent in plumbing the two languages together. If you have be all three people, you might wonder whether you could do everything in one language.

Related posts:

Plain Python
Ruby, Python, and Science
Software exoskeletons

SciPy integration misunderstanding

Today I needed to compute an integral similar to this:

int_{1000}^infty frac{dx}{100, x^3}

I used the following SciPy code to compute the integral:

from scipy.integrate import quad

def f(x):
    return 0.01*x**-3

integral, error = quad(f, 1000, sp.inf, epsrel = 1e-6)
print integral, error

My intention was to compute the integral to 6 significant figures. (epsrel is a shortened form of epsilon relative, i.e. relative error.) To my surprise, the estimated error was larger than the value of the integral. Specifically, the integral was computed as 5.15 × 10-9 and the error estimate was 9.07 × 10-9.

What went wrong? The integration routine quad lets you specify either a desired bound on your absolute error (epsabs) or a desired bound on your relative error (epsrel). I assumed that since I specified the relative error, the integration would stop when the relative error requirement was met. But that’s not how it works.

The quad function has default values for both epsabs and epsrel.

def quad(... epsabs=1.49e-8, epsrel=1.49e-8, ...):

I thought that since I did not specify an absolute error bound, the bound was not effective, or equivalently, that the absolute error target was 0. But no! It was as if I’d set the absolute error bound to 1.49 × 10-8. Because my integral is small (the exact value is 5 × 10-9) the absolute error requirement is satisfied before the relative error requirement and so the integration stops too soon.

The solution is to specify an absolute error target of zero. This condition cannot be satisfied, and so the relative error target will determine when the integration stops.

integral, error = quad(f, 1000, sp.inf, epsrel = 1e-6, epsabs = 0)

This correctly computes the integral as 5 × 10-9 and estimates the integration error as 4 ×10-18.

It makes some sense for quad to specify non-zero default values for both absolute and relative error, though I imagine most users expect small relative error rather than small absolute error, so perhaps the latter could be set to 0 by default.

Approximating Earth as a sphere

Isaac Newton suggested in 1687 that the earth is not a perfectly round sphere but rather an ellipsoid, and he was right. But since our planet is roughly a sphere, it’s often useful to approximate it by a sphere. So if you’re going to do that, what radius do you use? More generally, what radius do you use when approximating any ellipsoid by a sphere?

This post will discuss the more general problem of finding the radius when approximating any ellipsoid by a sphere. We will give the answer for Earth in particular, and we’ll show how to carry out the calculations. Most of the calculations are easy, but some involve elliptic integrals and we show how to compute these in Python. Continue reading

Running Python and R inside Emacs

Emacs org-mode lets you manage blocks of source code inside a text file. You can execute these blocks and have the output display in your text file. Or you could export the file, say to HTML or PDF, and show the code and/or the results of executing the code.

Here I’ll show some of the most basic possibilities. For much more information, see  orgmode.org. And for the use of org-mode in research, see A Multi-Language Computing Environment for Literate Programming and Reproducible Research.

Continue reading

Example of not inverting a matrix: optimization

People are invariably surprised when they hear it’s hardly ever necessary to invert a matrix. It’s very often necessary solve linear systems of the form Ax = b, but in practice you almost never do this by inverting A. This post will give an example of avoiding matrix inversion. I will explain how the Newton-Conjugate Gradient method works, implemented in SciPy by the function fmin_ncg.

Continue reading

The Python ecosystem

The hard part about getting started with Python is not the language but the ecosystem. It’s easy to find good references on the Python language, but it’s harder to learn what packages are available, how to install them, etc. That was my experience, and Miz Nazim started with a similar observation in his article Python Ecosystem: An Introduction.

Maybe its always harder to learn a language’s ecosystem than the language itself. But I think this was the case for me with Python more than with other languages I’ve used. I wish I’d found Nazim’s article or something like it when I was learning Python.

Related links:

Getting started with SciPy
Sage beginner’s guide
Python is a voluntary language
SciPyTip on Twitter