From the category archives:

Python

Code to slice open a Menger sponge

by John on August 30, 2011

Last month the New York Times ran a story about a sculpture based on cutting open a “Menger sponge,” a shape formed by recursively cutting holes through a cube. All the holes are rectangular, but when you cut the sponge open at an angle, you see six-pointed stars.

Here are some better photos, including both a physical model and a computer animation. Thanks to Mike Croucher for the link.

I’ve written some Python code to take slices of a Menger sponge. Here’s a sample output.

[click to continue...]

{ 3 comments }

A couple Python-like features in C++11

by John on August 17, 2011

The new C++ standard includes a couple Python-like features that I ran across recently. There are other Python-like features in the new standard, but here I’ll discuss range-based for-loops and raw strings.

[click to continue...]

{ 9 comments }

Sage Beginner’s Guide

by John on July 12, 2011

I like books. Given a choice, I’d much rather read a book than online documentation. Typically a book speaks with one voice and has been more carefully edited. Unfortunately, it can be hard to find books on specialized software. That’s why I was glad to hear there’s a book on Sage, a project that integrates many Python libraries for mathematical computing into a single context.

Craig Finch’s book Sage Beginner’s Guide provides an easy-to-read overview of Sage. The book is filled with examples. In fact, every topic is introduced by an example. Explanations follow the examples in sections entitled “What just happened?”. Follow-up exercises are provided to solidify the material after the example and explanation.

Someone could begin using Sage without knowing Python. They could think of Sage as an open source Mathematica-like application. But one of the strengths of Sage is that its underlying language is Python. And the Sage Beginner’s Guide has two chapters devoted to the Python language, one basic and one advanced.

Finch’s book is primarily focused on Sage as a whole, not the Python components it integrates. However, it does point out the component libraries that provide certain functionality when either the constituent library conflicts with Sage or can be used to refine Sage functionality.

Sage can be challenging to install. It is not yet directly supported on Windows; the recommended way to use Sage on Windows is to download a Linux virtual machine with Sage installed. I was able to install Sage on Ubuntu but not yet on OS X. However, you can try out Sage without installing it by using Sage online notebooks.

I don’t have as much experience with Sage as with some of its components. As far as I can tell, it’s easy to take your experience from component libraries — say NumPy — and bring it over to Sage. It would be harder to take functionality you discovered while using Sage and use it outside of Sage since, though that is to be expected since part of the value Sage adds is smoothing over the peculiarities of each component library.

{ 7 comments }

How to fit an elephant

by John on June 21, 2011

John von Neumann famously said

With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.

By this he meant that one should not be impressed when a complex model fits a data set well. With enough parameters, you can fit any data set.

It turns out you can literally fit an elephant with four parameters if you allow the parameters to be complex numbers.

I mentioned von Neumann’s quote on StatFact last week and Piotr Zolnierczuk replied with reference to a paper explaining how to fit an elephant:

“Drawing an elephant with four complex parameters” by Jurgen Mayer, Khaled Khairy, and Jonathon Howard,  Am. J. Phys. 78, 648 (2010), DOI:10.1119/1.3254017.

Piotr also sent me the following Python code he’d written to implement the method in the paper. This code produced the image above.

"""
Author: Piotr A. Zolnierczuk (zolnierczukp at ornl dot gov)

Based on a paper by:
Drawing an elephant with four complex parameters
Jurgen Mayer, Khaled Khairy, and Jonathon Howard,
Am. J. Phys. 78, 648 (2010), DOI:10.1119/1.3254017
"""
import numpy as np
import pylab

# elephant parameters
p1, p2, p3, p4 = (50 - 30j, 18 +  8j, 12 - 10j, -14 - 60j )
p5 = 40 + 20j # eyepiece

def fourier(t, C):
    f = np.zeros(t.shape)
    A, B = C.real, C.imag
    for k in range(len(C)):
        f = f + A[k]*np.cos(k*t) + B[k]*np.sin(k*t)
    return f

def elephant(t, p1, p2, p3, p4, p5):
    npar = 6
    Cx = np.zeros((npar,), dtype='complex')
    Cy = np.zeros((npar,), dtype='complex')

    Cx[1] = p1.real*1j
    Cx[2] = p2.real*1j
    Cx[3] = p3.real
    Cx[5] = p4.real

    Cy[1] = p4.imag + p1.imag*1j
    Cy[2] = p2.imag*1j
    Cy[3] = p3.imag*1j

    x = np.append(fourier(t,Cx), [-p5.imag])
    y = np.append(fourier(t,Cy), [p5.imag])

    return x,y

x, y = elephant(np.linspace(0,2*np.pi,1000), p1, p2, p3, p4, p5)
pylab.plot(y,-x,'.')
pylab.show()

Related posts:

Advantages of crude models
Occam’s razor and Bayes theorem

{ 8 comments }

Stand-alone scientific code

by John on June 7, 2011

Sometimes you need one or two scientific functions not included in your programming environment. For a number of possible reasons, you do not want to depend on an external library. For example, maybe you don’t want to take the time to evaluate libraries. Or maybe you want to give someone else a small amount of self-contained code. Here is a collection of code for these situations.

Stand-alone code for numerical computing

This page contains C++, Python, and C# code for special functions and random number generation with no external dependencies. Do whatever you want with it, no strings attached. Use at your own risk. I recently added software for gamma and log gamma functions, as well as a few random number generators. (Why separate functions for the gamma function and its logarithm? See explanation here.)

I don’t recommend using this code as a way to avoid learning a good library. If you’re writing Python, for example, I’d recommend using SciPy. But there are times when the advantages of being self-contained outweigh the advantages of using high-quality libraries.

Related posts:

Mathematical functions that seem unnecessary
SciPyTip: Daily tips on using scientific computing in Python
C# math gotchas

{ 8 comments }

Eric Floehr is the owner of ForecastWatch, a company that evaluates the accuracy of weather forecasts. In this interview Eric explains what his business does, how he got started, and some of the technology he uses.

[click to continue...]

{ 10 comments }

A knight’s tour magic square

by John on April 6, 2011

This magic square was created by Leonhard Euler (1707-1783). Each row and each column sum to 260. Each half-row and half-column sum to 130. The square is also a knight’s tour: a knight could visit each square on a chessboard exactly once by following the numbers in sequence.

Here is Python code to verify that the square has the properties listed above.

Update: It seems the attribution to Euler is a persistent error. Euler did publish the first paper on knight’s tours, but the knight’s tour square above was published by William Beverley in 1848. Thanks to George Jelliss for the correction. See the comments below.

Update 2: Notes from George Jelliss on magic king and queen tours.

{ 23 comments }

Python for high performance computing

by John on March 21, 2011

William Scullin’s talk from PyCon 2011: Python for high performance computing.

At least in our shop [Argonne National Laboratory] we have three accepted languages for scientific computing. In this order they are C/C++, Fortran in all its dialects, and Python. You’ll notice the absolute and total lack of Ruby, Perl, Java.

If you’re interested in Python and HPC, check out SciPyTip.

{ 8 comments }

Digital workflow

by John on March 13, 2011

William Turkel has a nice four-part series of blog posts entitled A Workflow for Digital Research Using Off-the-Shelf Tools. His four points are

  1. Start with a backup and versioning strategy.
  2. Make everything digital.
  3. Research 24/7 (using RSS feeds).
  4. Make local copies of everything.

Also by William Turkel, The Programming Historian, “an open-access introduction to programming in Python, aimed at working historians (and other humanists) with little previous experience.”

Related post:

Create offline, analyze online

{ 5 comments }

Scientific Python on Twitter

by John on February 21, 2011

Next week I’m starting a new daily tip Twitter account: @SciPyTip.

This account will post on things related to scientific computing in Python, including the SciPy library, related software, and scientific computing in general.

Full list of daily tip accounts

{ 2 comments }

See Microsoft Research’s announcement of the the Sho project.

Sho is an interactive environment for data analysis and scientific computing that lets you seamlessly connect scripts (in IronPython) with compiled code (in .NET) to enable fast and flexible prototyping. The environment includes powerful and efficient libraries for linear algebra as well as data visualization that can be used from any .NET language, as well as a feature-rich interactive shell for rapid development.

Maybe this is why Microsoft contracted Enthought this summer to port NumPy and SciPy to .NET.

{ 9 comments }

Ruby, Python, and Science

by John on November 28, 2010

David Jacobs has written a long blog post Ruby is beautiful (but I’m moving to Python). Here’s my summary.

Ruby is much better than Java, but the Ruby community is too focused on web development and the language has no scientific library. Python has a lot of the same advantages as Ruby, is used for more than web programming, and has SciPy.

Update: There is now a fledgling SciRuby project.

Further reading:

Plain Python
Getting started with SciPy
Replacing Mathematica with Python
SciPy and NumPy for .NET

{ 9 comments }

Best rational approximation

by John on October 20, 2010

Suppose you have a number x between 0 and 1. You want to find a rational approximation for x, but you only want to consider fractions with denominators below a given limit.

For example, suppose x = 1/e = 0.367879…  Rational approximations with powers of 10 in the denominator are trivial to find: 3/10, 36/100, 367/1000, etc. But say you’re willing to have a denominator as large as 10. Could you do better than 3/10? Yes, 3/8 = 0.375 is a better approximation. What about denominators no larger than 100? Then 32/87 = 0.36781… is the best choice, much better than 36/100.

How do you find the best approximations? You could do a brute force search. For example, if the maximum denominator size is N, you could try all fractions with denominators less than or equal to N. But there’s a much more efficient algorithm. The algorithm is related to the Farey sequence named after John Farey, though I don’t know whether he invented the algorithm.

The idea is to start with two fractions, a/b = 0/1 and c/d = 1/1. We update either a/b or c/d at each step so that a/b will be the best lower bound of x with denominator no bigger than b, and c/d will be the best upper bound with denominator no bigger than d. At each step we do a sort of binary search by introducing the mediant of the upper and lower bounds. The mediant of a/b and c/d is the fraction (a+c)/(b+d) which always lies between a/b and c/d.

Here is an implementation of the algorithm in Python. The code takes a number x between 0 and 1 and a maximum denominator size N. It returns the numerator and denominator of the best rational approximation to x using denominators no larger than N.

def farey(x, N):
    a, b = 0, 1
    c, d = 1, 1
    while (b <= N and d <= N):
        mediant = float(a+c)/(b+d)
        if x == mediant:
            if b + d <= N:
                return a+c, b+d
            elif d > b:
                return c, d
            else:
                return a, b
        elif x > mediant:
            a, b = a+c, b+d
        else:
            c, d = a+c, b+d

    if (b > N):
        return c, d
    else:
        return a, b

In Python 3.0, the float statement could be removed since the division operator does floating point division of integers.

Read more about rational approximation in Breastfeeding, the golden ratio, and rational approximation.

Here’s an example of a situation in which you might need rational approximations. Suppose you’re designing an experiment which will randomize subjects between two treatments A and B. You want to randomize in blocks of size no larger than N and you want the probability of assigning treatment A to be p. You could find the best rational approximation a/b to p with denominator b no larger than N and use the denominator as the block size. Each block would be a permutation of a A’s and b-a B’s.

Update 1: Here is a form that implements the algorithm above.

Update 2: Eugene Wallingford wrote a blog post about implementing the algorithm in Klein, a very restricted language used for teaching compiler writing.

{ 17 comments }

Bug in SciPy’s erf function

by John on September 2, 2010

Last night I produced the plot below and was very surprised at the jagged spike. I knew the curve should be smooth and strictly increasing.

My first thought was that there must be a numerical accuracy problem in my code, but it turns out there’s a bug in SciPy version 0.8.0b1. I started to report it, but I saw there were similar bug reports and one such report was marked as closed, so presumably the fix will appear in the next release.

The problem is that SciPy’s erf function is inaccurate for arguments with imaginary part near 5.8. For example, Mathematica computes erf(1.0 + 5.7i) as -4.5717×1012 + 1.04767×1012 i. SciPy computes the same value as -4.4370×1012 + 1.3652×1012 i. The imaginary component is off by about 30%.

Here is the code that produced the plot.

from scipy.special import erf
from numpy import linspace, exp
import matplotlib.pyplot as plt

def g(y):
    z = (1 + 1j*y) /  sqrt(2)
    temp = exp(z*z)*(1 - erf(z))
    u, v = temp.real, temp.imag
    return -v / u

x = linspace(0, 10, 101)
plt.plot(x, g(x))

{ 4 comments }

What does this code do?

by John on July 21, 2010

At the SciPy 2010 conference, a speaker showed several short code samples and asked us what each sample did. The samples were clearly written, but we had no comments to provide context. This was the last sample.

def what( x, n ):
    if n < 0:
        n = -n
        x = 1.0 / x
    z = 1.0
    while n > 0:
        if n % 2 == 1:
            z *= x
        x *= x
        n /= 2
    return z

The quiz was at the end of the day and I was tired. I couldn’t tell what the code does. Then I found out to my chagrin that the sample above implements an algorithm I know well. I’ve written the same code and I’ve even blogged about here.

This exercise changed my opinion of “self-documenting” code. Without some contextual clue, it is hard to understand the purpose of even a small piece of code.

Meaningful variable and function names would have helped, but a tiny comment might have helped even more. Not some redundant comment like explaining that the line x = 1.0 / x takes a reciprocal, but a comment explaining the problem the code is trying to solve.

For another example, what do you think this code does?

uint what()
{
    m_z = 36969 * (m_z & 65535) + (m_z >> 16);
    m_w = 18000 * (m_w & 65535) + (m_w >> 16);
    return (m_z << 16) + (m_w & 65535);
}

It’s clear enough what the code does at a low level — it’s just a few operations — but it’s not at all clear what it’s for.

Try to figure out what the code samples do before reading further. But if you give up, the first example is described here and the second example comes from here.

In an ordinary face-to-face conversation, more information is conveyed non-verbally than verbally. We may think that our literal words are most important, but so much is conveyed by voice inflection, facial expression, posture, etc. Something similar is going on with source code. When we read a piece of source code, we typically bring a huge amount of implicit knowledge with us.

Suppose a coworker Sam asks you to look at his code. The fact that the question came up at work provides a large amount of context; this isn’t just a random code fragment on the web. More specifically, you know what kinds of projects Sam works on. You know why Sam wants you to look at the code. He may be showing you something he’s proud of or he may be asking for help finding a bug. You know a lot about his code before you see it.

Now suppose you’re a contractor. Sam was hit by a bus and you’ve been asked to work on his projects until he gets out of the hospital. You may complain to his office mate that Sam’s code is an awful mess, but she can’t understand what you’re talking about. She thinks his code is perfectly clear.

Now suppose you’re a contractor on the opposite side of the world from Sam. You have even less context than if you were in his office talking to his office mate. After a great deal of agony, you send your contribution back to Sam’s company. You comment your code beautifully, but Sam’s colleagues complain that your code is poorly written and that you didn’t solve the right problem.

Institutional memory is more valuable than source code comments. It costs a great deal to replace a programmer, even one who leaves behind well-commented code.

Related posts:

Do you really want to be indispensable?
Preserving (the memory of) documents
The buck stops with the programmer

{ 30 comments }