Using py2exe with SciPy

py2exe is a program that takes Python code and produces a Windows executable that can run on computers that do not have Python installed. My focus here is in using py2exe on Python code that depends on SciPy.

py2exe is itself a Python program, and its latest version is built for Python 2.6. The code I want it to compile is written with Python 2.5 because the latest version of SciPy depends on Python 2.5. How do I tell py2exe that my code uses an earlier version of Python?

It took me a while to realize this, but py2exe has two version numbers: one for the version of py2exe and one for the version of Python. The key was to download and install py2exe-0.6.9.win32-py2.5.exe. This is the latest version of py2exe (version 0.6.9) for an earlier version of Python (version 2.5). The py2exe software runs in the same version of Python as the code it is compiling.

I tested py2exe on a script as follows:

from scipy.special import gamma
print "Gamma(1/2) = ", gamma(0.5)

The instructions given in the py2exe tutorial don’t quite work because of the dependence on SciPy, but the site gives an example of how to modify the file to specify the dependence on SciPy. Here’s where things get a little murky. The instructions don’t say exactly how to modify the file, but they give strong hints. I know my code depends on scipy.special, but I don’t know what further dependencies I might have. Here’s the setup file I used.

from distutils.core import setup
import py2exe

excludes = []
includes = ["scipy.special"]

opts = {
    "py2exe": {

setup(console=[''], options=opts)

This worked. The output listed 94 additional SciPy dependencies that I might need to include, some of which were clearly not needed. I was pretty sure, for example, that my program did not need email.Utils. Apparently I didn’t need any of these because the code worked fine.

It’s not clear just what files need to be distributed along with the .exe file that py2exe produces. py2exe creates a dist directory with the .exe file as well as other files that you might need, primarily .dll and .pyd files. Many of these were obviously unnecessary. I knew, for example, that my little command line program did not depend on Tk graphics. I deleted these and the code worked fine as expected.

Related posts:

A little optimization and a challenge

Waldir Pimenta asked me whether it is possible to test the condition

max(a, b) / min(a, b)  < r

without computing max(a, b) or min(a, b). Here a> 0, b> 0, and r > 1. (If r ≤ 1, the condition is always false.) The inequality has to be evaluated in an inner loop of a real-time image processing application and so the time saved by avoiding computing the max and min might matter.

We can multiply the original inequality by min(a, b) since a and b are positive. The condition becomes

max(a, b) < r min(a, b) = min(ra, rb).

We then expand this inequality into four simple inequalities:

a < ra
b < ra
a < rb
b < rb

Since r > 1, we know that a < ra and b < rb and so we only need to check a < rb and b < ra. In C notation we would evaluate

 ( a < r*b && b < r*a )

rather than

 ( max(a,b) / min(a,b) < r )

Whether this saves any time depends on context, though it’s plausible that the former might be more efficient. Some portion of the time the condition a < r*b will evaluate to false and the second half of the condition will not be evaluated.

(C evaluates an expression like (p && q) from left to right. If p is false, q is not evaluated since it is known at that point that the expression (p && q) will evaluate to false regardless of the value of q.)

Challenge: How often will the condition a < r*b evaluate to false? What are your assumptions? Leave your answers below.

Update: Combining the ideas of Carlos Santos and Christian Oudard in the comments below, you could implement the inequality as

a < b ? (b < r*a) : (a < r*b)

There are too many other good ideas in the comments for me to keep editing the blog post so I’ll just ask that you scroll down. I’m surprised at all the ideas that came out of evaluating a simple expression.

Amateur software

I’m growing increasingly frustrated with amateur software. Before I explain why, let me first be clear on what I do not mean by amateur.

  • Amateur does not mean low quality. Some amateur software is outstanding, and some professional software is terrible.
  • Amateur does not mean open source. Some amateur projects are open source and some are not.

I’m using “amateur software” to mean software projects developed by volunteers. I imagine most amateur software is written by professional developers. These are folks paid to write software for a company by day who then work on something else they love by night.

Open source software is not necessarily amateur software. Linux, for example, is now professional software. Around 75% of Linux kernel development is carried out by people paid to work on Linux. Some of the best software is both open source and at least partially professional.

Volunteers do what they want to do by definition. The problem is that the reverse is also true: volunteers do not do what they do not want to do. And for software developers, writing documentation usually falls in the “do not want to do” column. So does making software easy to install. So does testing in multiple environments.

When a company has an interest in a piece of software, they can pay people to do the tasks the volunteers don’t want to do. In fact, if they’re smart, they will concentrate their efforts precisely on the tasks volunteers don’t want to do. In this way even one or two paid staff can make an enormous contribution to a largely volunteer project.

Some amateur projects are highly polished. These may be small projects lead by rare individuals who pay attention to details beyond pure software development. More often, these are large mature projects that have so many volunteers that they have a few who are willing to do tasks that most developers do not want to do.

Related posts:

Malaria on the prairie

My family loves the Little House on the Prairie books. We read them aloud to our three oldest children and we’re in the process of reading them with our fourth child. We just read the chapter describing when the entire Ingalls family came down with malaria, or “fever ‘n’ ague” as they called it.

The family had settled near a creek that was infested with mosquitoes. All the settlers around the creek bottoms came down with malaria, though at the time (circa 1870) they did not know the disease was transmitted by mosquitoes. One of the settlers, Mrs. Scott, believed that malaria was caused by eating the watermelons that grew in the creek bottoms. She had empirical evidence: everyone who had eaten the melons contracted malaria. Charles Ingalls thought that was ridiculous. After he recovered from his attack of malaria, he went down to the creek and brought back a huge watermelon and ate it. His reasoning was that “Everybody knows that fever ‘n’ ague comes from breathing the night air.”

It’s easy to laugh at Mrs. Scott and Mr. Ingalls. What ignorant, superstitious people. But they were no more ignorant than their contemporaries, and both had good reasons for their beliefs. Mrs. Scott had observational data on her side. Ingalls was relying on the accepted wisdom of his day. (After all, “malaria” means “bad air.”)

People used to believe all kinds of things that are absurd now, particularly in regard to medicine. But they were also right about many things that are hard to enumerate now because we take them for granted. Stories of conventional wisdom being correct are not interesting, unless there was some challenge to that wisdom. The easiest examples of folk wisdom to recall may be the instances in which science initially contradicted folk wisdom but later confirmed it. For example, we have come back to believing that breast milk is best for babies and that a moderate amount of sunshine is good for you.

Related posts:

Apple are evil?

Mike Croucher wrote a post the other day explaining why he’s going to buy an iPad. He said that one of the objections to the iPad he’d heard was

Apple are evil because they take away control of how we use their devices.

I teased Mike that I would never say “Apple are evil.” On this side of the Atlantic we’d say “Apple is evil.” But in the UK it is accepted usage to say “Apple are evil.”

“Apple” is a collective noun when used to refer to Apple Inc. British English treats collective nouns as plural, but American English treats them as singular. Although the British usage sounds odd to my American ears, it makes sense just as much sense as the American convention. You could argue for plural verbs because corporations are made of individual people, or you could argue for singular verbs because the corporations act as a single entity. See Grammar Girl’s tip on collective nouns for more background.

By the way, I don’t believe Apple is evil. They’re just a company, no more or less virtuous than most other companies.

Apple posts:

Grammar posts:

Twitter daily tip news

I have five Twitter accounts that send out one tip per day, including a new one I just added last week.

Regular expressions

@RegexTip started over today. It’s a cycle of tips for learning regular expressions. It sticks to the regular expression features common to Python, Perl, C#, and many other programming languages. This account posts Monday through Friday.

Keyboard shortcuts

@SansMouse gives one tip a day on using Windows without a mouse. By practicing one keyboard shortcut a day, you can get into the habit of using your mouse less and your keyboard more. This cycle of tips started over January 29 with the most common and most widely useful shortcuts. I’m also sprinkling in a few extra tips that are less well known. This account also posts Monday through Friday.


I have three mathematical accounts. These post seven days a week.

@AlgebraFact, just started February 2. It will be a mixture of linear algebra, number theory, group theory, etc.

@ProbFact gives one fact per day from probability. Usually these facts are theorems, but sometimes they include a note on history or applications.

@AnalysisFact gives facts from real and complex analysis. The topics range from elementary to advanced.

What if I don’t use Twitter?

You can visit the page for a Twitter account just like any other web page. And every Twitter account has an RSS feed link allowing you to subscribe just as you would subscribe to a blog.

How do you write these?

I write up content for these accounts in bulk. I may sit down on a Saturday and come up with several weeks worth of tips. Then I use HootSuite to schedule the tips weeks in advance. Sometimes I’ll post something spontaneously, such as link to something relevant, but most of the work is done in advance. I use my personal Twitter account for live interaction.

Related links:

Using Windows without a mouse

Regular expressions in

Chart of probability distribution relationships

You can’t force people to provide metadata

I ran across a long rant from Steve Yegge this evening about junior programmers. In a nutshell, Yegge says they like to play around with metadata rather than getting real work done.

Here’s an insightful observation Yegge makes along the way.

And Haskell, OCaml and their ilk … try to force people to model everything. Programmers hate that. These languages will never, ever enjoy any substantial commercial success, for the exact same reason the Semantic Web is a failure. You can’t force people to provide metadata for everything they do. They’ll hate you.

Related post: Probability of semantic markup being correct

Weekend miscellany


Online diff tool
HTTP flowchart
Python propaganda
How to safely store a password
Google Docs dropping IE6 support
Why it’s hard to move Facebook off PHP
Keyboard shortcuts for Windows, Mac, and Linux


Fundamental examples in math
Daily fact from algebra and number theory
62nd Carnival of Mathematics


Always wear your seatbelt
Space shuttle repaired with duct tape (lunar rover too)

How to peel a pummelo. YouTube video with nice soundtrack.

Carnival of Mathematics #62

What is the Carnival of Mathematics? Math bloggers submit articles they have written recently and each month a host writes a post linking to the submitted posts. The sister carnival, Math Teachers at Play, focuses on math education and on math up through high school level. For a more thorough description of the two carnivals and some FAQs, please see Mike Croucher’s article What is a Maths Carnival?

I’m taking a turn hosting this month. Tradition dictates that the host begin with some trivia about the number of the post. As this is the 62nd Carnival of Mathematics, here are a few facts about 62.

  • 62 is the only number whose cube (238328) consists of 3 digits each occurring 2 times.
  • The great rhombicosidodecahedron has 62 sides.
  • Louis Pasteur developed the first rabies vaccination at age 62.

And now onto the posts.

Math and science teacher Cory Poole sends in a video that he created along with his partners and students. The video features a 64-foot Sierpenski triangle about of 12,000 tortilla chips. Read more about the story of the video. Also, here are the bloopers from making the video.

St. Swithun’s day is a sort of British analog of America’s Groundhog Day. If it rains on St. Swithun’s day, it is supposed to rain for the next 40 days. Is there some truth to the legend? See Jon McLoone‘s article Mathematica Tests the St. Swithun’s Day Proverb posted at Wolfram Blog.

Edmund Harriss from Maxwell’s Demon presents Spirographs and the third dimension. Interesting visually and mathematically.

toral spirograph

Rachel Thomas presents Beautiful symmetry provides glimpse into quantum world on News from the world of maths. This article reports on a low-termperature experiment that implies that the exceptional Lie group E8 is at work.

Did you know that sine and cosine are equal for all x? Heather (Xi) submitted a pseudo-proof in A=B implies that 1=1, therefore? by her colleague TwoPi at 360. (If there is ever a 360th Carnival of Mathematics, Heather should host it.)

Update: The 360 blog has agreed to host the 360th Carnival of Mathematics, tentatively scheduled for December 1, 2034. (Mike, I hope it’s OK that I scheduled this date without consulting you. ;))

Marc West presents The curse of the duck, a post about mathematics and cricket. His blog is Mr Science Show: Where Science Meets Pop Culture.

Dan M presents Appolonian Gaskets and Ford Circles on his blog mathrecreation. This post relates number theory and geometry as Ford circles are related to the Farey sequence.

Ford circles

Rick Regan presents Counting Binary and Hexadecimal Palindromes posted at Exploring Binary. Rick counts base 10 palindromes as a warm-up before diving into new territory with binary and hexidecimal numbers.

Gregory Astley wrote a guest post Maxima Tutorial – plotting direction fields for 1st order ODEs for Mike Croucher‘s blog Walking Randomly. (Maxima is part of the SAGE mathematics system. Mike has written several posts about SAGE lately.)

Annarita Ruberto from Matem@ticaMente presents the article “How heavy the fish?” The original post was written in Italian, and here is Google’s translation of the page into English.

Matt McDonnell presents Mathematical Recreations: Tweetable Game Of Life, a guest blog for Loren Shure‘s blog Loren on the Art of MATLAB.

For some mental arithmetic shortcuts and an explanation of why they work, see Sol Lederman‘s post Trachtenberg speed multiplication: exploring why it works on his blog Wild About Math.

Politics impacts sphere of human activity, including math education. See Jason Dyer‘s post Anatomy of a Political Math-Ed Reaction on The Number Warrior.

Peter Rowlett from Travels in a Mathematical World presents Substitution ciphers: Ancient – Renaissance, video included below.

You can keep up with Carnival of Mathematics news on Twitter by following @CarnivalOfMath. You may also be interested in daily math facts on Twitter from @ProbFact (probability), @AnalysisFact (real and complex analysis), and @AlgebraFact (algebra and number theory).

Python code for computing distribution parameters from percentiles

A few days ago I wrote a post on finding parameters so that a probability distribution satisfies two percentile conditions. Since then I’ve written Python code to carry out the calculations described in that article and the accompanying technical report.

The article is Finding probability distribution parameters from percentiles posted on CodeProject. The article comes with Python source code and some commentary. The article shows how SciPy and the functools module make it possible for the code to be very succinct.

Click to find out more about consulting for statistical computing


Related posts:

Probability distribution parameterizations in SciPy

Parameterizations are the bane of statistical software. One of the most common errors is to assume that one software package uses the same parameterization as another package. For example, some packages specify the exponential distribution in terms of the mean but others use the rate.

Python’s SciPy library has a somewhat unusual approach to parameterization with some advantages. SciPy makes every continuous distribution a location-scale family, even those distributions that typically do not have a location or scale parameter. This eliminates, for example, the question of whether an exponential distribution is parameterized by its mean or its rate. There is no mean or rate parameter per se. But there is a scale parameter, which happens to also be the mean.

Some methods on distribution classes have unusual names. For example, the inverse CDF function, often called the quantile function, is ppf for “percentile point function.” The complementary CDF function, or CCDF, is called sf for “survival function.” (Survival function is not an unusual name, though my preference would have been ccdf since that would make the API more symmetric.)

Discrete distributions in SciPy do not have a scale parameter. Also instead of a pdf method the discrete distributions have a pmf method; continuous functions have a probability density function but discrete methods have a probability mass function.

One surprise with SciPy distributions is that the SciPy implementation of the lognormal distribution does not correspond to the definition I’m more familiar with unless the location is 0. In order to be consistent with other continuous distributions, SciPy shifts the PDF argument x whereas I believe it is more common to shift log(x). This isn’t just a difference in parameterization. It actually amounts to different distributions.

For more details, see these notes on distributions in SciPy. See also these notes on distributions in R and in Mathematica for comparison.

Related posts:

For daily tips on Python and scientific computing, follow @SciPyTip on Twitter.

Scipytip twitter icon

Little programs versus big programs

From You Are Not a Gadget:

Little programs are delightful to write in isolation, but the process of maintaining large-scale software is always miserable. … Technologists wish every program behaved like a brand-new, playful little program, and will use any available psychological strategy to avoid thinking about computers realistically.

Related posts:

New Python podcast: A little bit of Python

There’s a new Python podcast: A little bit of Python with Michael Foord, Brett Cannon, Jesse Noller, Steve Holden, and Andrew Kuchling.

So far I’ve found the first episode most interesting. It discusses the “moratorium”, the plan to give Python library authors time catch up with Python 3 before extending the core language further. This sounds like a very smart move.

Related posts:

Updating “PowerShell Day 1” for PowerShell version 2.0

Last year I wrote a little 10-page booklet called PowerShell Day 1. It covers many of the things I wish I had known when I started using PowerShell.

  • How do I configure PowerShell?
  • How do I make PowerShell launch faster?
  • How do I get documentation?
  • Why did PowerShell make some of the design decisions they did?
  • Once I’ve written some useful functions and scripts, where do I put them?
  • Where can I find more PowerShell resources?

Now I’ve started updating the booklet to reflect changes in PowerShell version 2.0. I haven’t had a lot of experience with version 2.0 and would appreciate your help updating the booklet. I have put a link to an alpha version of the update for version 2.0 on the download page.