Houston secrets

Posted on 29 July 2010 by John

The Engines of our Ingenuity episode Secret City discusses little-known city features. I was particularly interested in the Houston locations the show mentioned. I’ve been through the downtown tunnels John Leinhard mentions, but I was not familiar with the 1940 Air Terminal Museum or the Gable Street Power Station.

Here are three Houston-area attractions that I find many Houstonians don’t know about.

Houston Area Live Steamers at Zube Park, trains that are just big enough to ride. The HALS folks are proud of their work and very friendly.
Edith Moore Nature Sanctuary, a hidden patch of woods in the middle of Houston.
Byzantine Fresco Chapel which contains the only intact Byzantine frescoes in the western hemisphere.

Later than you expected, but sooner than you think

Posted on 29 July 2010 by John

The title of this post is the last line of a 60-Second Science podcast. [Update: looks like the link went away.] The podcast announces a recent study that says we tend to over-estimate our abilities before we start something new, but under-estimate our abilities once we get started. Sounds true to me.

Rosenbrock’s banana function

Posted on 28 July 2010 by John

Rosenbrock’s banana function is a famous test case for optimization software. It’s called the banana function because of its curved contours.

The definition of the function is

f(x, y) = (1 - x)^2 + 100(y - x^2)^2

The function has a global minimum at (1, 1). If an optimization method starts at the point (−1.2, 1), it has to find its way to the other side of a flat, curved valley to find the optimal point.

Rosenbrock’s banana function is just one of the canonical test functions in the paper “Testing Unconstrained Optimization Software” by Moré, Garbow, and Hillstrom in ACM Transactions on Mathematical Software Vol. 7, No. 1, March 1981, pp 17-41. The starting point (−1.2, 1) mentioned above comes from the starting point required in that paper.

I mentioned a while back that I was trying out Python alternatives for tasks I have done in Mathematica. I tried making contour plots with Python using matplotlib. This was challenging to figure out. The plots did not look very good, though I imagine I could have made satisfactory plots if I had explored the options. Instead, I fired up Mathematica and produced the plot above easily using the following code.

Banana[x_, y_] := (1 - x)^2 + 100 (y - x*x)^2
ContourPlot[Banana[x, y], {x, -1.5, 1.5}, {y, 0.7, 2.0}]

Miscellaneous Emacs adventures

Posted on 28 July 2010 by John

I recently found out there’s an Emacs command M-x woman that’s a pun on “w/o man”, i.e. a way to read online help without using the usual man command.

* * *

I tried to edit a 1.2 GB text file with Emacs the other day. I got an error saying that Emacs has a 500 MB file size limit on 32-bit systems. This was on a 32-bit Windows XP machine, so the warning was reasonable. I appreciated that it promptly said it could not open the file.

I tried again on my 64-bit Windows 7 machine and got the same message; I believe I’m running a 32-bit version of Emacs on my 64-bit PC.

I also tried opening the file with Emacs on a 64-bit Red Hat Enterprise Linux box with 16 GB of memory. This time I did not get a file size message. Instead, Emacs crashed.

The file opened quickly on Windows, even my 32-bit XP box, using Notepad++.

* * *

When I set up Emacs on Windows, I created an “Open with Emacs” context menu following the instructions here. That has worked well except for two problems.

Every file is opened in a new instance of Emacs.
It stopped working on one of my computers for reasons unknown.

I’ve been meaning to solve the first problem, and when the second happened I decided it was time. According to the Emacs documentation,

The recommended way to associate files is to associate them with
emacsclientw.exe. In order for this to work when Emacs is not yet started, you will also need to set the environment variable ALTERNATE_EDITOR to runemacs.exe. To open files in a running instance of Emacs, you will need to add the following to your init file: (server-start)

I had to tinker with that a little to make it work. I know I had to add the path to my Emacs bin directory to my PATH environment variable but I don’t remember what else I did.

The key thing to note is that you should associate files with emacsclientw.exe rather than with runemacs.exe. I used the instructions in this article to remove the association with the latter. Thanks to Mark for pointing out the article.

(The Windows “Open with” feature remains a mystery. The article above explains how to remove a program from the “Open with” list, and the directions worked for removing runemacs.exe. But they did not work for removing other programs. I’m also unclear how you add something to the “Open with” list. If you right-click on a file and select “Open with -> Choose program …” the program you select may appear in the “Open with” list next time. Or not. This did add emacsclientw.exe to the list.)

Related post:

Remapping Caps Lock

Posted on 27 July 2010 by John

I remap the Caps Lock key to be a control key on every computer that I use regularly. Here’s why.

Caps Lock is a nearly worthless key taking up valuable real estate.
I’m more likely to use Caps Lock accidentally than intentionally.
I use the Control key far more often than the Caps Lock key.
The Caps Lock key is in a consistent position on all keyboards but the left Control key isn’t.

Some people swap the Caps Lock and left Control keys. I prefer to just disable the left Control key. On desktop keyboards, I map the useless Scroll Lock key to act as a Caps Lock key for those rare instances when I actually want to type a long sequence of capital letters.

KeyTweak is a convenient program for remapping a Windows keyboard.

On Linux, you can use the xmodmap utility.The following commands disable the left Control key and turn the Caps Lock key into a control key.

xmodmap -e "remove Lock = Caps_Lock"
xmodmap -e "remove Control = Control_L"
xmodmap -e "keysym Caps_Lock = Control_L"
xmodmap -e "add Control = Control_L"

See Dave Richeson’s comment below for Mac instructions.

Related post: 41 dumb things to check

Sine approximation for small angles

Posted on 27 July 2010 by John

For small angles, sin(θ) is approximately θ. This post takes a close look at this familiar approximation.

I was confused when I first heard that sin(θ) ≈ θ for small θ. My thought was “Of course they’re approximately equal. All small numbers are approximately equal to each other.” I also was confused by the footnote “The angle θ must be measured in radians.” If the approximation is just saying that small angles have small sines, it doesn’t matter whether you measure in radians or degrees. I missed the point, as I imagine most students do. I run into people who have completed math or science degrees without understanding this tidbit from their freshman year.

The point is that the error in the approximation sin(θ) ≈ θ is small, even relative to the size of θ. If θ is small, the error in the approximation sin(θ) ≈ θ is really, really small. I use the phrase “really, really” in a precise sense; I’m not just talking like an eight-year-old child. If θ is small, θ² is really small and θ³ is really, really small.

For small values of θ, say θ < 1, the difference between sin(θ) and θ is very nearly θ³/6 and so the absolute error is really, really small. The relative error is about θ²/6. If the absolute error is really, really small, the relative error is really small.

The approximation sin(θ) ≈ θ comes from a Taylor series, but there’s more going on than that. For example, the approximation log(1 + x) ≈ x also comes from a Taylor series, but that approximation is not nearly as accurate.

The Taylor series for log(1 + x) is

x – x²/2 + x³/3 – …

This means that for small x, log(1 + x) is approximately x, and the error in the approximation is roughly x²/2. So if x is small, the absolute error in the approximation is really small and the relative error is small. Notice this is one less “really” than the analogous statement about sines.

The reason the sine approximation is one “really” better than the log approximation is that sine is an odd function, so its Taylor series has only odd terms. The error is approximately the first term left out. The series for sin(θ) is

θ − θ³ / 3! + θ⁵/5! – …

so sin(θ) – θ is on the order of θ³, not just θ².

We can say even more. Not only is the error in sin(θ) ≈ θ approximately θ³/6, it is in fact bounded by θ³/6 for small, positive θ. This comes from the alternating series theorem. When you make an approximation from an alternating Taylor series, the error is bounded by the first term you leave out. The argument has to be small enough that the terms of the series decrease in absolute value. For example, if you stick θ = 1 into the series for sine, each term of the series is smaller than the previous one. But if you stick in θ = 2 then the terms get larger before they get smaller. So the precise meaning of “small enough” is small enough that the terms of the alternating series decrease in absolute value.

The series for log(1 + x) also alternates, so the error estimate is also an upper bound on the error in that case. Not all series are alternating series. But when we do have an alternating series, we get a bonus as far as being able to control the error in approximations.

It’s also true that for small θ, cos(θ) is approximately 1. At first this may seem analogous to saying that for small θ, sin(θ) is approximately 0. While both are true, the former is much better.

The series for cosine is

1 − θ²/2! + θ⁴/4! – …

and so the error in approximating cos(θ) with 1 is approximately θ²/2. In fact, since the series alternates, the error is bound by θ²/2, if θ is small enough. And in this case the relative error is approximately equal to the absolute error. This says that for small θ, the approximation cos(θ) ≈ 1 is really, really good. By contrast, saying that sin(θ) ≈ 0 for small θ is a bad approximation: the absolute error is approximately θ and the relative error is 100%. Rounding small values of θ down to 0 produces a very good approximation in one context and a poor approximation in another context.

Woodpeckers

Posted on 26 July 2010 by John

From The Dip:

A woodpecker can peck twenty times on a thousand trees and get nowhere, but stay busy. Or he can peck twenty-thousand times on one tree and get dinner.

Windows XP and Ubuntu start-up music

Posted on 23 July 2010 by John

I just realized that the start-up music for Ubuntu is a variation on the start-up music for Windows XP. (You can hear the Ubuntu theme in this video at around 0:10 [Update: video has been removed]. The Windows XP theme is in this video at around 1:16.) If you don’t hear the similarity, concentrate on the rhythm rather than the melody.

The Ubuntu music style is African, like the word ubuntu. It was influenced by Windows, like the Ubuntu user interface, but it’s a new composition.

I’ve been running Ubuntu on a virtual machine. The sound quality was so bad that I never clearly heard Ubuntu start up. But I recently installed Ubuntu on a physical machine and heard the start-up music clearly for the first time.

What does this code do?

Posted on 21 July 2010 by John

At the SciPy 2010 conference, a speaker showed several short code samples and asked us what each sample did. The samples were clearly written, but we had no comments to provide context. This was the last sample.

    def what( x, n ):
        if n < 0:         
            n = -n         
            x = 1.0 / x     
        z = 1.0     
        while n > 0:
            if n % 2 == 1:
                z *= x
            x *= x
            n /= 2
        return z

The quiz was at the end of the day and I was tired. I couldn’t tell what the code does. Then I found out to my chagrin that the sample above implements an algorithm I know well. I’ve written the same code and I’ve even blogged about here.

This exercise changed my opinion of “self-documenting” code. Without some contextual clue, it is hard to understand the purpose of even a small piece of code.

Meaningful variable and function names would have helped, but a tiny comment might have helped even more. Not some redundant comment like explaining that the line x = 1.0 / x takes a reciprocal, but a comment explaining the problem the code is trying to solve.

For another example, what do you think this code does?

    uint what()
    {
        m_z = 36969 * (m_z & 65535) + (m_z >> 16);
        m_w = 18000 * (m_w & 65535) + (m_w >> 16);
        return (m_z << 16) + (m_w & 65535);
    }

It’s clear enough what the code does at a low level—it’s just a few operations—but it’s not at all clear what it’s for.

Try to figure out what the code samples do before reading further. But if you give up, the first example is described here and the second example comes from here.

In an ordinary face-to-face conversation, more information is conveyed non-verbally than verbally. We may think that our literal words are most important, but so much is conveyed by voice inflection, facial expression, posture, etc. Something similar is going on with source code. When we read a piece of source code, we typically bring a huge amount of implicit knowledge with us.

Suppose a coworker Sam asks you to look at his code. The fact that the question came up at work provides a large amount of context; this isn’t just a random code fragment on the web. More specifically, you know what kinds of projects Sam works on. You know why Sam wants you to look at the code. He may be showing you something he’s proud of or he may be asking for help finding a bug. You know a lot about his code before you see it.

Now suppose you’re a contractor. Sam was hit by a bus and you’ve been asked to work on his projects until he gets out of the hospital. You may complain to his office mate that Sam’s code is an awful mess, but she can’t understand what you’re talking about. She thinks his code is perfectly clear.

Now suppose you’re a contractor on the opposite side of the world from Sam. You have even less context than if you were in his office talking to his office mate. After a great deal of agony, you send your contribution back to Sam’s company. You comment your code beautifully, but Sam’s colleagues complain that your code is poorly written and that you didn’t solve the right problem.

Institutional memory is more valuable than source code comments. It costs a great deal to replace a programmer, even one who leaves behind well-commented code.

e-book formats

Posted on 20 July 2010 by John

ManyBooks.net has nearly 28,000 free e-books available for download. Each book is available in 23 different formats.

I wondered what software they used to create so many formats and how popular each format is. The site answers both these questions. This page lists the software used and this page [Update: page removed] lists the popularity statistics.

PDF is not as dominant as I would have expected. It is the most popular format, but it only accounts for 20% of downloads. Here’s a bar chart of the downloads per format. Click on the graph for a larger version.

See also Wikipedia’s comparison of e-book formats.

Month: July 2010

Houston secrets

Related posts

Later than you expected, but sooner than you think

Rosenbrock’s banana function

Miscellaneous Emacs adventures

Remapping Caps Lock

Sine approximation for small angles

Related posts

Woodpeckers

Windows XP and Ubuntu start-up music

What does this code do?

Related posts

e-book formats