From the category archives:

Python

Random improvisation subjects

by John on February 23, 2010

Destination ImagiNation is a non-profit organization that encourages student creativity. This is my family’s first year to participate in DI and it has been a lot of fun. One of the things that impresses me most about DI is that they have strict rules limiting adult input.

This weekend I was an appraiser at a DI competition for an improvisation challenge. Teams could prepare for the overall format of the challenge, but some elements of the challenge were randomly selected on the day of the competition. This year the improvisations centered around endangered things. Teams were given a list of 10 endangered things ahead of time, but they wouldn’t know which thing would be theirs until just before they had to perform. Some of the things on the list were endangered animals, such as the giant panda. There were also other things in danger of disappearing, such as the VHS tape. The students also had to use a randomly chosen stock character and had to include a character with a randomly chosen “unimpressive superpower.”

There were 13 teams in the elementary division. What would you expect from 13 teams randomly selecting 10 endangered things? Obviously some endangered thing has to be chosen at least twice. Would you expect every item on the list to be chosen at least once? How often do you expect the most common item would be chosen?

In our case, three teams were assigned “glaciers” and five were assigned “the landline telephone.” The other items were assigned once or not at all. (No one was assigned “the Yiddish language”. Too bad. I really wanted to see what the students would do with that one.)

Is there reason to suspect that the assignments were not random? How likely is it that in a competition of 13 teams that five or more teams would be given the same subject? How likely is it that every subject would be used at least once? See an explanation here. Make a guess before looking at my answer.

Here’s some Python code you could use to simulate the selection of endangered things.

from random import random

num_reps     = 100000 # number of simulation repetitions
num_subjects = 10     # number of endangered things
num_teams    = 13     # number of teams competing

def maxperday():
    tally = [0] * num_subjects
    for i in range(num_teams):
        subject = int(random()*num_subjects)
        tally[subject] += 1
    return max(tally)

total = 0
for rep in range(num_reps):
    if maxperday() >= 5:
        total += 1
print float(total)/num_reps

{ 4 comments }

Using py2exe with SciPy

by John on February 12, 2010

py2exe is a program that takes Python code and produces a Windows executable that can run on computers that do not have Python installed. My focus here is in using py2exe on Python code that depends on SciPy. [click to continue...]

{ 1 comment }

Twitter daily tip news

by John on February 8, 2010

I have five Twitter accounts that send out one tip per day, including a new one I just added last week.

Regular expressions

@RegexTip started over today. It’s a cycle of tips for learning regular expressions. It sticks to the regular expression features common to Python, Perl, C#, and many other programming languages. This account posts Monday through Friday.

Keyboard shortcuts

@SansMouse gives one tip a day on using Windows without a mouse. By practicing one keyboard shortcut a day, you can get into the habit of using your mouse less and your keyboard more. This cycle of tips started over January 29 with the most common and most widely useful shortcuts. I’m also sprinkling in a few extra tips that are less well known. This account also posts Monday through Friday.

Math

I have three mathematical accounts. These post seven days a week.

@AlgebraFact, just started February 2. It will be a mixture of linear algebra, number theory, group theory, etc.

@ProbFact gives one fact per day from probability. Usually these facts are theorems, but sometimes they include a note on history or applications.

@AnalysisFact gives facts from real and complex analysis. The topics range from elementary to advanced.

What if I don’t use Twitter?

You can visit the page for a Twitter account just like any other web page. And every Twitter account has an RSS feed link allowing you to subscribe just as you would subscribe to a blog.

How do you write these?

I write up content for these accounts in bulk. I may sit down on a Saturday and come up with several weeks worth of tips. Then I use HootSuite to schedule the tips weeks in advance. Sometimes I’ll post something spontaneously, such as link to something relevant, but most of the work is done in advance. I use my personal Twitter account for live interaction.

Related links:

Using Windows without a mouse

Regular expressions in

Chart of probability distribution relationships

{ 2 comments }

A few days ago I wrote a post on finding parameters so that a probability distribution satisfies two percentile conditions. Since then I’ve written Python code to carry out the calculations described in that article and the accompanying technical report.

The article is Finding probability distribution parameters from percentiles posted on CodeProject. The article comes with Python source code and some commentary. The article shows how SciPy and the functools module make it possible for the code to be very succinct.

Related posts:

Probability distribution parameters in SciPy
Numerical computing in IronPython with Ironclad
Getting started with SciPy

{ 0 comments }

Parameterizations are the bane of statistical software. One of the most common errors is to assume that one software package uses the same parameterization as another package. For example, some packages specify the exponential distribution in terms of the mean but others use the rate. [click to continue...]

{ 4 comments }

New Python podcast: A little bit of Python

by John on February 1, 2010

There’s a new Python podcast: A little bit of Python with Michael Foord, Brett Cannon, Jesse Noller, Steve Holden, and Andrew Kuchling.

So far I’ve found the first episode most interesting. It discusses the “moratorium”, the plan to give Python library authors time catch up with Python 3 before extending the core language further. This sounds like a very smart move.

Related posts:

Good enough for Google and NASA
Plain Python

{ 1 comment }

PyIMSL now free for non-commercial use

by John on November 16, 2009

Visual Numerics announced today that their PyIMSL Studio is now free for non-commercial use. PyIMSL contains Python wrappers for the IMSL scientific computing library and integrates with NumPy, matplotlib, etc.

PyIMSL Studio architecture diagram

Related links:

Getting started with SciPy
IEEE floating point arithmetic in Python
Probability distributions in SciPy
Numerical computing in IronPython

{ 0 comments }

Good enough for Google and NASA

by John on August 17, 2009

Leo Laporte’s comment on Python in the latest FLOSS Weekly podcast:

If it’s good enough for Google and NASA, it’s good enough for me, baby.

The podcast is an interesting interview with Michael Foord on IronPython. Leo Laporte’s comment comes near the end of the show, around 51:20.

Related posts:

IronPython is a one-way gate (see Michael Foord’s comments)
Numerical computing in IronPython with Ironclad

{ 0 comments }

CodeProject just published my article
Getting Started with SciPy (Scientific Python)

{ 1 comment }

IEEE floating point arithmetic in Python

by John on July 21, 2009

Sometimes a number is not a number. Numeric data types represent real numbers in a computer fairly well most of the time, but sometimes the abstraction leaks. The sum of two numeric types is always a numeric type, but the result might be a special bit pattern that says overflow occurred. Similarly, the ratio of two numeric types is a numeric type, but that type might be a special type that says the result is not a number.

The IEEE 754 standard dictates how floating point numbers work. I’ve talked about IEEE exceptions in C++ before. This post is the Python counterpart. Python’s floating point types are implemented in terms of C’s double type  and so the C++ notes describe what’s going on at a low level. However, Python creates a higher level abstraction for floating point numbers. (Python also has arbitrary precision integers, which we will discuss at the end of this post.)

There are two kinds of exceptional floating point values: infinities and NaNs. Infinite values are represented by inf and can be positive or negative. A NaN, not a number, is represented by nan. Let x = 10200. Then x2 will overflow because 10400 is too big to fit inside a C double. (To understand just why, see Anatomy of a floating point number.) In the following code, y will contain a positive infinity.

x = 1e200; y = x*x

If you’re running Python 3.0 and you print y, you’ll see inf. If you’re running an earlier version of Python, the result may depend on your operating system. On Windows, you’ll see 1.#INF but on Linux you’ll see inf. Now keep the previous value of y and run the following code.

z = y; z /= y

Since z = y/y, you might think z should be 1. But since y was infinite, it doesn’t work that way. There’s no meaningful way to assign a numeric value to the ratio of infinite values and so z contains a NaN. (You’d have to know “how they got there” so you could take limits.) So if you print z you’d see nan or 1.#IND depending on your version of Python and your operating system.

The way you test for inf and nan values depends on your version of Python. In Python 3.0, you can use the functions math.isinf and math.isnan respectively. Earlier versions of Python do not have these functions. However, the SciPy library has corresponding functions scipy.isinf and scipy.isnan.

What if you want to deliberately create an inf or a nan? In Python 3.0, you can use float('inf') or float('nan'). In earlier versions of Python you can use scipy.inf and scipy.nan if you have SciPy installed.

IronPython does not yet support Python 3.0, nor does it support SciPy directly. However, you can use SciPy with IronPython by using Ironclad from Resolver Systems. If you don’t need a general numerical library but just want functions like isinf and isnan you can create your own.


def isnan(x): return type(x) is float and x != x
def isinf(x): inf = 1e5000; return x == inf or x == -inf

The isnan function above looks odd. Why would x != x ever be true? According to the IEEE standard, NaNs don’t equal anything, even each other. (See comments on the function IsFinite here for more explanation.) The isinf function is really a dirty hack but it works.

To wrap things up, we should talk a little about integers in Python. Although Python floating point numbers are essentially C floating point numbers, Python integers are not C integers. Python integers have arbitrary precision, and so we can sometimes avoid problems with overflow by working with integers. For example, if we had defined x as 10**200 in the example above, x would be an integer and so would y = x*x and y would not overflow; a Python integer can hold 10400 with no problem. We’re OK as long as we keep producing integer results, but we could run into trouble if we do anything that produces a non-integer result. For example,

x = 10**200; y = (x + 0.5)*x

would cause y to be inf, and

x = 10**200; y = x*x + 0.5

would throw an OverflowError exception.

Related posts:

Floating point numbers are a leaky abstraction
Anatomy of a floating point number
Overflow and loss of precision

{ 5 comments }

Probability distributions in SciPy

by John on July 20, 2009

Here are some notes on how to work with probability distributions using the SciPy numerical library for Python.

Functions related to probability distributions are located in scipy.stats. The general pattern is

scipy.stats.<distribution family>.<function>

There are 81 supported continuous distribution families and 12 discrete distribution families. Some distributions have obvious names: gamma, cauchy, t, f, etc. The only possible surprise is that all distributions begin with a lower-case letter, even those corresponding to a proper name (e.g. Cauchy). Other distribution names are less obvious: expon for the exponential, chi2 for chi-squared distribution, etc.

Each distribution supports several functions. The density and cumulative distribution functions are pdf and cdf respectively. (Discrete distributions use pmf rather than pdf.) One surprise here is that the inverse CDF function is called ppf for “percentage point function.” I’d never heard that terminology and would have expected something like “quantile.”

Example: scipy.stats.beta.cdf(0.1, 2, 3) evaluates the CDF of a beta(2, 3) random variable at 0.1.

Random values are generated using rvs which takes an optional size argument. The size is set to 1 by default.

Example: scipy.stats.norm.rvs(2, 3) generates a random sample from a normal (Gaussian) random variable with mean 2 and standard deviation 3. The function call scipy.stats.norm.rvs(2, 3, size = 10) returns an array of 10 samples from the same distribution.

The command line help() facility does not document the distribution parameterizations, but the external documentation does. Most distributions are parameterized in terms of location and scale. This means, for example, that the exponential distribution is parameterized in terms of its mean, not its rate. Somewhat surprisingly, the exponential distribution has a location parameter. This means, for example, that scipy.stats.expon.pdf(x, 7) evaluates at x the PDF of an exponential distribution with location 7. This is not what I expected. I assumed there would be no location parameter and that the second argument, 7, would be the mean (scale). Instead, the location was set to 7 and the scale was left at its default value 1. Writing scipy.stats.expon.pdf(x, scale=7) would have given the expected result because the default location value is 0.

SciPy also provides constructors for objects representing random variables.

Example: x = scipy.stats.norm(3, 1); x.cdf(2.7) returns the same value as scipy.stats.norm.cdf(2.7, 3, 1).

Constructing objects representing random variables encapsulates the differences between distributions in the constructors. For example, some distributions take more parameters than others and so their object constructors require more arguments. But once a distribution object is created, its PDF, for example, can be called with a single argument. This makes it easier to write code that takes a general distribution object as an argument.

Related posts:

Numerical computing in IronPython with IronClad
Stand-alone error function erf(x)

{ 7 comments }

Plain Python

by John on May 8, 2009

Perl is cool, much more so than Python. But I prefer writing Python.

Perl is fun to read about. It has an endless stream of features to discover. Python by comparison is kinda dull. But the aspects of a language that make it fun to read about do not necessarily make it pleasant to use.

I wrote Perl off and on for several years before trying Python. People would tell me I should try Python and every six months or so I’d skim through a Python book. My impression was that Python was prosaic. It didn’t seem to offer any advantage over Perl, so I stuck with Perl. (Not that I was ever very good at Perl.)

Then I read an article by Bruce Eckel saying that he liked Python because he could remember the syntax. He said that despite teaching Java and writing books on Java, he could never remember the syntax for opening a file in Java, for example. But he could remember the corresponding Python syntax. I would never have picked up on that by skimming books. You’ve got to actually use a language a while to know how memorable the syntax is. But  I had used Perl enough to know that I could not remember its syntax without frequent use. Memorable syntax increases productivity. You don’t have to break your train of thought as often to reach for a reference book.

I stand by my initial impression that Python is plain, but I now think that’s a strength. It just gets out of my way and lets me get my work done. I’m sure  Perl gurus can be extremely productive in Perl. I tried being a Perl guru, and I never made it. I wouldn’t say I’m a Python guru, but I also don’t feel the need to be a guru to use Python.

Python code is not cool in a line-by-line sense, not in the way that an awesomely powerful Perl one-liner is cool. Python is cool in more subtle ways.

Related posts:

Languages that are easy to pick back up
API symmetry
Periodic table of Perl operators
Three-hour-a-week language

{ 16 comments }

Someone sent me email regarding my online calculator for computing the distance between to locations given their longitude and latitude values. He wants to do sort of the opposite. Starting with the longitude and latitude of one location, he wants to find the longitude and latitude of locations moving north/south or east/west of that location. I like to answer reader questions when I can, so here goes. I’ll give a theoretical derivation followed by some Python code.

Longitude and latitude and are usually measured in degrees, but theoretical calculations are cleaner in radians.  Someone using the Python code below could think in terms of degrees; radians will only be used inside function implementations. We’ll use the fact that on a circle of radius r, an arc of angle θ radians has length rθ. We’ll assume the earth is a perfect sphere. See this post for a discussion of how close the earth is to being a sphere.

Moving North/South

I’ll start with moving north/south since that’s simpler. Let R be the radius of the earth. An arc of angle φ radians on the surface of the earth has length M = Rφ, so an arc M miles long corresponds to an angle of φ = M/R radians. Moving due north or due south does not change longitude.

Moving East/West

Moving east/west is a little more complicated. At the equator, the calculation is just like the calculation above, except that longitude changes rather than latitude. But the distance corresponding to one degree of longitude changes with latitude. For example, one degree of longitude along the Arctic Circle doesn’t take you nearly as far as it does at the equator.

Suppose you’re at latitude φ degrees north of the equator. The circumference of a circle at constant latitude φ, a circle parallel to the equator, is cos φ times smaller than the circumference of the equator. So at latitude φ an angle of θ radians describes an arc of length M = R θ cos φ. A distance M miles east or west corresponds to a change in longitude of θ = M/(R cos φ). Moving due east or due west does not change latitude.

Python code

The derivation above works with angles in radians. Python’s cosine function also works in radians. But longitude and latitude are usually expressed in degrees, so function inputs and outputs are in degrees.

import math

# Distances are measured in miles.
# Longitudes and latitudes are measured in degrees.
# Earth is assumed to be perfectly spherical.

earth_radius = 3960.0
degrees_to_radians = math.pi/180.0
radians_to_degrees = 180.0/math.pi

def change_in_latitude(miles):
    "Given a distance north, return the change in latitude."
    return (miles/earth_radius)*radians_to_degrees

def change_in_longitude(latitude, miles):
    "Given a latitude and a distance west, return the change in longitude."
    # Find the radius of a circle around the earth at given latitude.
    r = earth_radius*math.cos(latitude*degrees_to_radians)
    return (miles/r)*radians_to_degrees

Related posts:

What is the shape of the earth?
Spherical trigonometry
Finding distances using latitude and longitude

{ 8 comments }

MIT replaces Scheme with Python

by John on March 26, 2009

According to an article on The Snowtide Blog, MIT has decided to teach beginning CS classes in Python rather than Scheme. (Thanks to procoders.net for the link.) The article paraphrases remarks by Gerdald Sussman at the announcement.The main rationale was that the world is more complicated now than when the course was designed and SICP was written. Now programmers spend more time researching libraries than writing everything from scratch.

Here’s something I expected Sussman to say though apparently he did not: you can use Python as a functional language if you like, you just don’t have to. I don’t know why more people don’t emphasize this. Python is not a functional language, but Python does make it easy to declare functions on the fly, pass functions as arguments, etc. You’re free to write Scheme-like programs in Python if you want to, but you’re also free to use other styles when they are more appropriate.

{ 7 comments }

In a previous post, I discuss my difficulties calling some Python modules from IronPython. In particular I wanted to call SciPy from IronPython and couldn’t. The discussion following that post brought up Ironclad as a possible solution. I wanted to learn more about Ironclad, and so I invited William Reade to write a guest post about the project. I want to thank William for responding to my request with a  very helpful article. — John


Hi! My name’s William Reade, and I’ve spent the last year or so working on Ironclad, an open-source project which helps IronPython to inter-operate better with CPython. Michael Foord recently introduced  me to our host John, who kindly offered me the opportunity to write a bit about  my work and, er, how well it works. So, here I am.

To give you a little bit of context, I’ve been working at Resolver Systems for several years now; our main product, Resolver  One, is a spreadsheet with very tight IronPython integration. We like to describe  it as a “Pythonic spreadsheet”, and that’s clearly a concept that people like.  However, when people think of a “Pythonic spreadsheet”, they apparently expect it  to work with popular Python libraries — such as NumPy and SciPy — and we found that IronPython’s incompatibility put us at a serious disadvantage. And, for some reason, nobody seemed very keen to  solve the problem for us, so we had to do it ourselves.

The purpose of Ironclad is to allow you to use Python C extensions (of which there are many) from inside IronPython without recompiling anything. The secret purpose  has always been to get NumPy working in Resolver One, and in release 1.4 we finally  achieved this goal. Although the integration is still alpha level, you can import  and use NumPy inside the spreadsheet grid and user code: you can see a screencast  about the integration here.

However, while Resolver One is a great tool, you aren’t required to use it to get the benefits: Ironclad has been developed completely separately, has no external  dependencies, and is available under an open source license. If you consider  yourself adequately teased, keep reading for a discussion of what Ironclad actually  does, what it enables you to do, and where it’s headed.

[click to continue...]

{ 2 comments }