Shell != REPL

A shell is not the same as a REPL (Read Evaluate Print Loop). They look similar, but they have deep differences.

Shells are designed for one-line commands, and they’re a little awkward when used as programming languages.

Scripting languages are designed for files of commands, and they’re a little awkward to use from a REPL.

IPython is an interesting hybrid. You could think of it as a Python REPL with shell-like features added. Eshell is another interesting compromise, a shell implemented in Emacs Lisp that also works as a Lisp REPL. These hybrids are evidence that as much as people like their programming languages, they appreciate additions to a pure language REPL.

Root-finding with noisy functions

Suppose you have function g(x) and you want to find x so that g(x) = 0. However, you don’t have direct access to g(x). Instead you can evaluate f(x) which is g(x) plus random noise. Typically f(x) would be an approximation of g(x) based on Monte Carlo simulation. I have spent countless hours solving problems like this.

One difficulty is that the value of f(x) will change every time you compute it. You might hunt down x and be sure you have it braketed between two values, only to find out you were wrong when you compute f(x) more accurately, say with more Monte Carlo steps.

Assume g(x), i.e. f(x) without noise, is monotone increasing. Assume also that f(x) is expensive to compute. Say each evaluation takes 20 minutes. The amplitude of the noise may vary with x.

Here’s a simple approach to trying to find where f(x) is zero. Use a bisection method, but stop when it looks like you’ve come as close as you can due to noise. If the monotonicity assumption is violated, use that as a signal that you’re within the resolution of the noise.

# Assume a < b, fa = f(a) < 0, fb = f(b) > 0.
def noisy_bisect(f, a, b, fa, fb, tolerance):
    if b-a < tolerance:
        return (a, b)
    mid = 0.5*(a+b)
    fmid = f(mid)
    if fmid < fa or fmid > fb:
        # Monotonicity violated. Reached resolution of noise.
        return (a, b)
    if fmid < 0:
        a, fa = mid, fmid
        b, fb = mid, fmid
    return noisy_bisect(f, a, b, fa, fb, tolerance)

The values of f(a) and f(b) are saved to avoid recalculation since they’re expensive to obtain.

You could experiment with this function as follows.

from scipy.stats import norm
from time import sleep

# Gaussian distribution w/ mean 0 and standard deviation 0.01
noise = norm(0, 0.01)

def f(x):
    sleep(10) # wait 10 seconds
    return x**2 - 1 + noise.rvs()

a = 0
b = 3
fa = f(a)
fb = f(b)
print noisy_bisect(f, a, b, fa, fb, 0.0001)

Without noise, the code would iterate until the zero is bracketed in an interval of width 0.0001. However, since the noise in on the order of 0.01, the method will typically stop when the bracket is a on the order of 0.01 wide, maybe a little smaller.

We could make the root-finding software a little faster by using a secant method rather than a bisection method. Instead of taking the midpoint, the secant method draws a line between (a, f(a)) and (b, f(b)) and uses the point where the line crosses the x-axis as the next guess.

A more sophisticated approach would be to increase the accuracy of f(x) over time. If f(x) is evaluated using N steps of a Monte Carlo simulation, the standard deviation of the noise is roughly proportional to 1/√N. So, for example, cutting the noise in half requires multiplying N by 4. You could get a rough bracket on the zero using a small value of N, then increase N as the bracket gets smaller. You could play with this by using a function like the following.

def f(x, N):
    sleep(N) # wait N seconds
    noise = norm(0, 0.1/sqrt(N))
    return x**2 - 1 + noise.rvs()

You can make the noise as small as you’d like, but at a price. And the price goes up faster than the noise goes down, so it pays to not reduce the noise more than necessary.

Calculating pi with AGM and mpmath

This post gives an algorithm based on the arithmetic-geometric mean that rapidly converges to pi. I’ll use it to illustrate multiple precision arithmetic using Python’s mpmath module.

Given two non-negative numbers a and b, their arithmetic mean is (a + b)/2 and their geometric mean is √(ab). Suppose you start with two non-negative numbers and take their arithmetic mean and geometric mean. Then take the arithmetic and geometric mean of the results. Keep doing this repeatedly, and the result converges to what is called the arithmetic-geometric mean (AGM).

There is a formula for calculating pi based on the AGM that goes back to Gauss.

pi = frac{ 4M^2left(1, frac{1}{sqrt{2}}right)}{1 - sum_{n=1}^infty 2^{n+1} (a_n^2 - b_n^2)}

Here M(a, b) is the AGM of a and b, and an and bn are the values after n steps of the iteration.

The process defining the AGM converges quickly, and so the formula is practical for computing pi. I found it in a paper from 1988, and at that time the formula had been used to compute pi to over 29 million decimal places. In the code below, we compute pi to 997 decimal places in only 10 iterations.

from mpmath import mp, sqrt, mpf, pi, log10, fabs

# carry out calculations to 1000 decimal places
decimal_places = 1000
mp.dps = decimal_places
epsilon = 1/mpf(10**decimal_places)

# set a = 1 and b = 1/sqrt(2) as multi-precision numbers
a = mpf(1)
b = 1/sqrt(mpf(2))

diff = a - b
series = mpf(0)

n = 0
while diff > epsilon:
    n += 1
    arith = (a + b)/2
    geom = sqrt(a*b)
    a, b = arith, geom
    series += 2**(n+1) * (a*a - b*b)
    diff = a - b

# a and b have converged to the AGM
my_pi = 4*a*a/(1 - series)

error = fabs(pi - my_pi)
decimal_places = int(-log10(error))

print "Number of steps used: %d" % n
print "Number of correct decimal places: %d" % decimal_places

The code can be used to calculate pi out further. The number of correct decimal places roughly doubles with each iteration. For example, computing pi to 10,000 places takes only 3 more iterations.

Related posts:

Using SciPy with IronPython

Three years ago I wrote a post about my disappointment using SciPy with IronPython. A lot has changed since then, so I thought I’d write a short follow-up post.

To install NumPy and SciPy for use with IronPython, follow the instructions here. [Update: no longer available.] After installation, NumPy works as expected.

There is one small gotcha with SciPy. To use SciPy with IronPython, start ipy with the command line argument -X:Frames. Then you can use SciPy as you would from CPython. For example.

c:> ipy -X:Frames
>>> import scipy as sp
>>> sp.pi

Without the -X:Frames option you’ll get an error when you try to import scipy.

AttributeError: 'module' object has no attribute '_getframe'

According to this page,

The issue is that SciPy makes use of the CPython API for inspecting the current stack frame which IronPython doesn’t enable by default because of a small runtime performance hit. You can turn on this functionality by passing the command line argument “-X:Frames” to on the command line.

Machine Learning in Action

A couple months ago I briefly reviewed Machine Learning for Hackers by Drew Conway and John Myles White. Today I’m looking at Machine Learning in Action by Peter Harrington and comparing the two books.

Both books are about the same size and cover many of the same topics. One difference between the two books is choice of programming language: ML for Hackers uses R for its examples, ML in Action uses Python.

ML in Action doesn’t lean heavily on Python libraries. It mostly implements its algorithms from scratch, with a little help from NumPy for linear algebra, but it does not use ML libraries such as scikit-learn. It sometimes uses Matplotlib for plotting and uses Tkinter for building a simple GUI in one chapter. The final chapter introduces Hadoop and Amazon Web Services.

ML for Hackers is a little more of a general introduction to machine learning. ML in Action contains a brief introduction to machine learning in general, but quickly moves on to specific algorithms. ML for Hackers spends a good number of pages discussing data cleaning. ML in Action starts with clean data in order to spend more time on algorithms.

ML in Action takes 8 of the top 10 algorithms in machine learning (as selected by this paper) and organizes around these algorithms. (The two algorithms out of the top 1o that didn’t make it into ML in Action were PageRank, because it has been covered well elsewhere, and EM, because its explanation requires too much mathematics.) The algorithms come first in ML in Action, illustrations second. ML for Hackers puts more emphasis on its examples and reads a bit more like a story. ML in Action reads a little more like a reference book.

Solutions to knight's random walk

My previous post asked this question:

Start a knight at a corner square of an otherwise-empty chessboard. Move the knight at random by choosing uniformly from the legal knight-moves at each step. What is the mean number of moves until the knight returns to the starting square?

There is a mathematical solution that is a little arcane, but short and exact. You could also approach the problem using simulation, which is more accessible but not exact.

The mathematical solution is to view the problem as a random walk on a graph. The vertices of the graph are the squares of a chess board and the edges connect legal knight moves. The general solution for the time to first return is simply 2N/k where N is the number of edges in the graph, and k is the number of edges meeting at the starting point. Amazingly, the solution hardly depends on the structure of the graph at all. It only requires that the graph is connected. In our case N = 168 and k = 2.

For a full explanation of the math, see this online book, chapter 3, page 9. Start there and work your way backward until you understand the solution.

And now for simulation. The problem says to pick a legal knight’s move at random. The most direct approach would be to find the legal moves at a given point first, then choose one of those at random. The code below achieves the same end with a different approach. It first chooses a random move, and if that move is illegal (i.e. off the board) it throws that move away and tries again.  This will select a legal move with the right probability, though perhaps that’s not obvious. It’s what’s known as an accept-reject random generator.

from random import randint

# Move a knight from (x, y) to a random new position
def new_position(x, y):

    while True:
        dx, dy = 1, 2

        # it takes three bits to determine a random knight move:
        # (1, 2) vs (2, 1), and the sign of each
        r = randint(0, 7)
        if r % 2:
            dx, dy = dy, dx
        if (r >> 1) % 2:
            dx = -dx
        if (r >> 2) % 2:
            dy = -dy

        newx, newy = x + dx, y + dy
        # If the new position is on the board, take it.
        # Otherwise try again.
        if (newx >= 0 and newx < 8 and newy >= 0 and newy < 8):
            return (newx, newy)

# Count the number of steps in one random tour
def random_tour():
    x, y = x0, y0 = 0, 0
    count = 0
    while True:
        x, y = new_position(x, y)
        count += 1
        if x == x0 and y == y0:
            return count

# Average the length of many random tours
sum = 0
num_reps = 100000
for i in xrange(num_reps):
    sum += random_tour()
print sum / float(num_reps)

A theorem is better than a simulation, but a simulation is a lot better than nothing. This problem illustrates how sometimes we think we need to simulate when we don’t. On the other hand, when you have a simulation and a theorem, you have more confidence in your solution because each validates the other.

Python as a Lisp dialect

From Peter Norvig:

Basically, Python can be seen as a dialect of Lisp with “traditional” syntax … Python supports all of Lisp’s essential features except macros, and you don’t miss macros all that much because it does have eval, and operator overloading, and regular expression parsing, so some — but not all — of the use cases for macros are covered.

Source: Python for Lisp Programmers

Math languages vs. application languages

Last Friday I posted on @SciPyTip a summary of why I like SciPy, the scientific programming library for Python.

I’d rather do math in a general-purpose language than try to do general-purpose programming in a math language.

Mathematical software is never purely mathematical. The math has to connect to something, and that’s where most of the programming effort may go.

If you’re responsible for the math and for the larger system it’s embedded in, you have three options.

  1. Write the math in an application programming language.
  2. Do application programming in a mathematical language.
  3. Use different languages for the math and the larger system.

My preferred option is #1. My second choice is #3. My last choice, by far, is #2.

Application languages are typically better for math than mathematical languages are for applications. Application languages also have a larger user base, and hence better documentation and tools.

Mixing two languages works well in a team that has someone specialized in the math language, someone specialized in the application language, and someone fluent in plumbing the two languages together. If you have be all three people, you might wonder whether you could do everything in one language.

Related posts:

SciPy integration misunderstanding

Today I needed to compute an integral similar to this:

int_{1000}^infty frac{dx}{100, x^3}

I used the following SciPy code to compute the integral:

from scipy.integrate import quad

def f(x):
    return 0.01*x**-3

integral, error = quad(f, 1000, sp.inf, epsrel = 1e-6)
print integral, error

My intention was to compute the integral to 6 significant figures. (epsrel is a shortened form of epsilon relative, i.e. relative error.) To my surprise, the estimated error was larger than the value of the integral. Specifically, the integral was computed as 5.15 × 10-9 and the error estimate was 9.07 × 10-9.

What went wrong? The integration routine quad lets you specify either a desired bound on your absolute error (epsabs) or a desired bound on your relative error (epsrel). I assumed that since I specified the relative error, the integration would stop when the relative error requirement was met. But that’s not how it works.

The quad function has default values for both epsabs and epsrel.

def quad(... epsabs=1.49e-8, epsrel=1.49e-8, ...):

I thought that since I did not specify an absolute error bound, the bound was not effective, or equivalently, that the absolute error target was 0. But no! It was as if I’d set the absolute error bound to 1.49 × 10-8. Because my integral is small (the exact value is 5 × 10-9) the absolute error requirement is satisfied before the relative error requirement and so the integration stops too soon.

The solution is to specify an absolute error target of zero. This condition cannot be satisfied, and so the relative error target will determine when the integration stops.

integral, error = quad(f, 1000, sp.inf, epsrel = 1e-6, epsabs = 0)

This correctly computes the integral as 5 × 10-9 and estimates the integration error as 4 ×10-18.

It makes some sense for quad to specify non-zero default values for both absolute and relative error, though I imagine most users expect small relative error rather than small absolute error, so perhaps the latter could be set to 0 by default.

Approximating Earth as a sphere

Isaac Newton suggested in 1687 that the earth is not a perfectly round sphere but rather an ellipsoid, and he was right. But since our planet is roughly a sphere, it’s often useful to approximate it by a sphere. So if you’re going to do that, what radius do you use? More generally, what radius do you use when approximating any ellipsoid by a sphere?

This post will discuss the more general problem of finding the radius when approximating any ellipsoid by a sphere. We will give the answer for Earth in particular, and we’ll show how to carry out the calculations. Most of the calculations are easy, but some involve elliptic integrals and we show how to compute these in Python.

First of all, what is an ellipsoid? It is a surface whose (x, y, z) coordinates satisfy

frac{x^2}{a^2} + frac{y^2}{b^2} + frac{z^2}{c^2} = 1

Earth is an oblate spheroid, which means a = b > c. Specifically, a = b = 6,378,137 meters, and c = 6,356,752 meters.

If you wanted to approximate an ellipsoid by a sphere, you could use

r = (a + b + c)/3.

Why? Because the knee-jerk reaction whenever you need to reduce a set of numbers to one number is to average them.

We could do a little better, depending on what property of the ellipsoid we’d like to preserve in our approximation. For example, we might want to create a sphere with the same volume as the ellipsoid. In that case we’d use the geometric mean

r = (abc)1/3.

This is because the volume of an ellipsoid is 4πabc/3 and the volume of a sphere is 4πr3/3.

For the particular case of the earth, we’d use

(a2c)1/3 = 6371000.7

For some applications we might want a sphere with the same surface area as the ellipsoid rather than the same volume.

The surface area of an ellipsoid is considerably more complicated than the volume. For the special case of an oblate spheroid, like earth, the area is given by

2pi a^2 left( 1 + frac{1 - e^2}{e} tanh^{-1}e right)


e^2 = 1 - frac{c^2}{a^2}

The surface area of a sphere is 4 πr2 and so the following code computes r.

from math import sqrt, atanh
e = sqrt(1 - (c/a)**2)
r = a*sqrt(1 + (1-e**2)*atanh(e)/e) / sqrt(2)

This gives r = 6371007.1 for the earth, about 6.4 meters more than the number we got matching volume rather than area.

For a general ellipsoid, the surface area is given by

2pi c^2 + frac{2pi a b}{sin varphi} left( E(varphi, k) sin^2varphi + F(varphi, k) cos^2 varphiright)


cos varphi = frac{c}{a}


k^2 = frac{a^2(b^2 - c^2)}{b^2(a^2 - c^2)}

Here F is the “incomplete elliptic integral of the first kind” and E is the “incomplete elliptic integral of the second kind.” The names are historical artifacts, but the “elliptic” part of name comes from the fact that these functions were discovered in the context of arc lengths with ellipses, so it shouldn’t be too surprising to see them here.

In SciPy, F(φ, k) is given by ellipkinc and E(φ, k) is given by ellipeinc. Both function names start with ellip because they are elliptic functions, and end in inc because they are “incomplete.” In the middle, ellipeinc has an “e” because it computes the mathematical function E(φ, k).

But why does ellipkinc have a “k” in the middle? The “complete” elliptic integral of the first kind is K(k) = F(π/2, k). The “k” in the function name is a reminder that we’re computing the incomplete version of the K function.

Here’s the code for computing the surface area of a general ellipsoid:

from math import sin, cos, acos, sqrt, pi
from scipy.special import ellipkinc, ellipeinc

def area(a, b, c):
    phi = acos(c/a)
    k = a*sqrt(b**2 - c**2)/(b*sqrt(a**2 - c**2))
    E = ellipeinc(phi, k)
    F = ellipkinc(phi, k)
    elliptic = E*sin(phi)**2 + F*cos(phi)**2
    return 2.0*pi*c**2 + 2*pi*a*b*elliptic/sin(phi)

The differences between the various approximation radii are small for Earth. See my next post on elliptical galaxies where the differences are much larger.

Related posts:

Running Python and R inside Emacs

Emacs org-mode lets you manage blocks of source code inside a text file. You can execute these blocks and have the output display in your text file. Or you could export the file, say to HTML or PDF, and show the code and/or the results of executing the code.

Here I’ll show some of the most basic possibilities. For much more information, see And for the use of org-mode in research, see A Multi-Language Computing Environment for Literate Programming and Reproducible Research.

Source code blocks go between lines of the form


On the #+begin_src line, specify the programming language. Here I’ll demonstrate Python and R, but org-mode currently supports C++, Java, Perl, etc. for a total of 35 languages.

Suppose we want to compute √42 using R.

#+begin_src R

If we put the cursor somewhere in the code block and type C-c C-c, org-mode will add these lines:

: 6.48074069840786

Now suppose we do the same with Python:

#+begin_src python
from math import sqrt

This time we get disappointing results:

: None

What happened? The org-mode manual explains:

… code should be written as if it were the body of such a function. In particular, note that Python does not automatically return a value from a function unless a return statement is present, and so a ‘return’ statement will usually be required in Python.

If we change sqrt(42) to return sqrt(42) then we get the same result that we got when using R.

By default, evaluating a block of code returns a single result. If you want to see the output as if you were interactively using Python from the REPL, you can add :results output :session following the language name.

#+begin_src python :results output :session
print "There are %d hours in a week." % (7*24)

This produces the lines

: There are 168 hours in a week.
: 1024

Without the :session tag, the second line would not appear because there was no print statement.

I had to do a couple things before I could get the examples above to work. First, I had to upgrade org-mode. The version of org-mode that shipped with Emacs 23.3 was quite out of date. Second, the only language you can run by default is Emacs Lisp. You have to turn on support for other languages in your .emacs file. Here’s the code to turn on support for Python and R.

    'org-babel-load-languages '((python . t) (R . t)))

Update: My next post shows how to call code in written in one language from code written in another language.

Related posts:

Example of not inverting a matrix: optimization

People are invariably surprised when they hear it’s hardly ever necessary to invert a matrix. It’s very often necessary solve linear systems of the form Ax = b, but in practice you almost never do this by inverting A. This post will give an example of avoiding matrix inversion. I will explain how the Newton-Conjugate Gradient method works, implemented in SciPy by the function fmin_ncg.

If a matrix A is large and sparse, it may be possible to solve Ax = b but impossible to even store the matrix A-1 because there isn’t enough memory to hold it. Sometimes it’s sufficient to be able to form matrix-vector products Ax. Notice that this doesn’t mean you have to store the matrix A; you have to produce the product Ax as if you had stored the matrix A and multiplied it by x.

Very often there are physical reasons why the matrix A is sparse, i.e. most of its entries are zero and there is an exploitable pattern to the non-zero entries. There may be plenty of memory to store the non-zero elements of A, even though there would not be enough memory to store the entire matrix. Also, it may be possible to compute Ax much faster than it would be if you were to march along the full matrix, multiplying and adding a lot of zeros.

Iterative methods of solving Ax = b, such as the conjugate gradient method, create a sequence of approximations that converge (in theory) to the exact solution. These methods require forming products Ax and updating x as a result. These methods might be very useful for a couple reasons.

  1. You only have to form products of a sparse matrix and a vector.
  2. If don’t need a very accurate solution, you may be able to stop very early.

In Newton’s optimization method, you have to solve a linear system in order to find a search direction. In practice this system is often large and sparse. The ultimate goal of Newton’s method is to minimize a function, not to find perfect search directions. So you can save time by finding only approximately solutions to the problem of finding search directions. Maybe an exact solution would in theory take 100,000 iterations, but you can stop after only 10 iterations! This is the idea behind the Newton-Conjugate Gradient optimization method.

The function scipy.optimize.fmin_ncg can take as an argument a function fhess that computes the Hessian matrix H of the objective function. But more importantly, it lets you provide instead a function fhess_p that computes the product of the H with a vector. You don’t have to supply the actual Hessian matrix because the fmin_ncg method doesn’t need it. It only needs a way to compute matrix-vector products Hx to find approximate Newton search directions.

For more information, see the SciPy documentation for fmin_ncg.

How to compute jinc(x)

The function jinc(x) that I wrote about yesterday is almost trivial to implement, but not quite. I’ll explain why it’s not quite as easy as it looks and how one might implement it in C and Python.

The function jinc(x) is defined as J1(x) / x, so if you have code to compute J1 then it ought to be a no-brainer. For example, why not use the following C code?

#include <math.h>
double jinc(double x) {
    return j1(x) / x;

The problem is that if you pass in 0, the code will divide by 0 and return a NaN. The function jinc(x) is defined to be 1/2 at x = 0 because that’s the limit of J1(x)(x) / x as x goes to 0. So we try again:

#include <math.h>
double jinc(double x) {
    return (x == 0.0) ? 0.5 : j1(x) / x;

Does that work? Technically, it could still fail — we’ll come back to that at the end — but we’ll assume for now that it’s OK.

We could write the analogous Python code, and it would be adequate as long as we’re only calling the function with scalars and not NumPy arrays.

from scipy.special import j1
def jinc(x):
    if x == 0.0:
        return 0.5
    return j1(x) / x

Now suppose you want to plot this function. You create an array of points, say

x = np.linspace(-1, 1, 25)

and plot jinc(x). You’ll get a warning: “ValueError: The truth value of an array with one element is ambiguous. Use a.any() or a.all().” Incidentally, if we called linspace with an even integer in the last argument, our array of points would avoid zero and the naive implementation of jinc would work.

When Python tries to apply jinc to an array, it doesn’t know how to interpret the test x == 0. The warning suggests “Do you mean if any component of x is 0? Or if all components of x are 0?” Neither option is what we want. We want to apply jinc as written to each element of x. We could do this by calling the vectorize function.

jinc = np.vectorize(jinc)

This replaces our original jinc function with one that handles NumPy arrays correctly.

There is an extremely unlikely scenario in which the code above could fail. The value of J1(x) is approximately x/2 for small values of x. If the floating point value x is so small that 0.5*x returns 0, our function will return 0, even though it should return 0.5. The C code above works for values of x as small as DBL_MIN and even values much smaller. (DBL_MIN is not the smallest value of a double, only the smallest normalized double.) But if you set

x = DBL_MIN / pow(2.0, 52);

then jinc(x) will return 0. If you want to be absolutely safe, you could change the implementation to

#include <math.h>
double jinc(double x) {
    return (fabs(x) < 1e-8) ? 0.5 : j1(x) / x;

Why test for whether the absolute value is less than 10-8 rather than a much smaller number? For small x, the error in approximating jinc(x) with 1/2 is on the order of x2/16. So for x as large as 10-8, the approximation error is below the resolution of a double. As a bonus, the function jinc(x) will be more efficient for |x| < 10-8 since it avoids a call to j1.

Related posts:

The Python ecosystem

The hard part about getting started with Python is not the language but the ecosystem. It’s easy to find good references on the Python language, but it’s harder to learn what packages are available, how to install them, etc. That was my experience, and Miz Nazim started with a similar observation in his article Python Ecosystem: An Introduction.

Maybe its always harder to learn a language’s ecosystem than the language itself. But I think this was the case for me with Python more than with other languages I’ve used. I wish I’d found Nazim’s article or something like it when I was learning Python.

Related links: