Why computers were invented

From a lecture by Gregory Chaitin:

We all know that the computer is a very practical thing out there in the real world … But what people don’t remember as much is that really — I’m going to exaggerate, but I’ll say it — the computer was invented in order to help to clarify … a philosophical question about the foundations of mathematics.

Interview with Michael Hammel, author of The Artist’s Guide to GIMP

Michael J. Hammel is the author of The Artist’s Guide to GIMP, a book I reviewed three weeks ago. The following interview is based on my correspondence with Michael.

JC: The bio on your book says that you are an embedded software developer. Is GIMP something you use for fun or is it related to your work? Was it related to your work before?

MH: I use it once in awhile at work but not often. GIMP started as something related to what I worked on when it first came out (graphics software for Unix systems). But I’ve never been paid to be an artist. At one point I looked at trying to use what I knew about it to leverage my way into Pixar or Dreamworks. But that never panned out and my interest in that kind of animation has peeked and ebbed, mostly because I decided I didn’t want to live in California (too expensive).

JC: How did you become involved in the development of GIMP?

MH: When GIMP was born, back around 1995 or so, I was working for a company that provided commercial X servers for Unix systems (Xi Graphics, started by Thomas Roell who wrote the original reference implementation of an X server for 386 systems, aka X386, which begot XFree86 which begot X.org). I packaged Motif as part of my work. A guy I worked with noticed GIMP’s Motif version and pointed me at it. I wrote a plugin for John Beale’s sparkle code (the original Sparkle plugin). I’ve been working with GIMP ever since.

JC: Are you still a developer for GIMP or primarily a user?

MH: A user. I haven’t provided any useful code development to the project in quite sometime.

JC: What’s it like to be a developer on GIMP?

MH: Take this with a grain of salt. It’s a follower’s view from the outside, without being involved with the developers directly.

GIMP’s developer community is somewhat different than other open source communities that I’ve been involved with. GIMP’s leadership is de facto, but not necessarily structured. Many other projects have defined leaders with defined tasks. GIMP’s development is a little more oriented towards scratching your own itch: pick something that that you want to improve and start working on it. The only tough part (at least for some) is understanding and adhering to the main developers coding styles and rules, which are not (to my knowledge) written down.

My only complaint with this style is it tends to allow long term development cycles. End users can’t rely on when new features might be available. While the developers want do things “the right way”, that means that users have been waiting a very long time, for example, for deep paint support. There is a tradeoff for how you do development. Remember that the developers are volunteers, so there is no valid reason for complaining if you, as a user, aren’t getting what you want when you want it. If it bothers you, step in and help out. Otherwise, do as I do: be content with their development process.

JC: What advice would you give people who want to learn image editing?

MH: I learned how to use GIMP by reading Photoshop texts. Image editing is a common set of processes. Think about what you want to do if you weren’t using software. You’re house needs a new color? You mask off the window boarders and roof line and start painting. Same thing in image editing. Only difference: it’s easier to try different colors. Image editing is all about selections and masks. I’m not talking about creating the next Marvel comic, I mean editing existing images. Once you identify what needs to be changed using a selection and mask, you can do all kinds of things: copy and distort, recolor, refocus, etc.

Making comics or custom drawings requires real artistic talent to create a rough design on paper, then import (via scanner or digital camera) into GIMP for cleanup and coloring.

Personally, I prefer taking photos and mixing them together to create a scene that otherwise wouldn’t exist. There is a small example of this in the book. A photo of a city skyline as it exists and a munged version I created to make the city look decayed and falling apart. Beats me what people think of that image, but the creative process that went into it was very relaxing.

JC: What advice would you give people who want to learn GIMP in particular?

MH: Buy my book. Seriously. You learn both at the same time: what is image editing and how do you do it with GIMP. GIMP is just like any other desktop software. You learn where the menus are and what the various windows provide. The rest of your time is learning process — creating a workflow to produce a specific image type or style.

JC: What other software do you use/recommend for creating graphics?

MH: I use simple viewers like Geeqie for browsing images, but if you’re into photography you’ll nee something like f-spot. Photographers, who are nuts about image quality, should learn ufraw. Those interested in 3D or animation should take a look at Blender.

JC: Do you want to say anything about your work with embedded development?

MH: It’s my primary focus in life these days. I’m working on creating a custom media-focused Linux distribution (which I call BeagleBox) for the BeagleBoard C4 (and later the xm). I’d like to get my hands on the Raspberry Pi but wasn’t able to place an order when they first went on sale.

Embedded work is the future for Linux. The desktop for non-techies might change to tablets or such, but Linux will be in everything: your fridge, your TV, your car, your electrical system, your water and sewage systems, your phone, you’re neighbors who are flying robotic UAVs to peek in your windows. Everything.

My day job is building a radar system for UAVs (the big ones) that use embedded Linux to run out of flash. But the direction of interest is smaller devices, little boards with processors and memory in one, like the Pi or BeagleBoard, that can sense the environment and communicate to the rest of the household, with one of those devices used to display all the info on a big wall, removing the need for specialized devices just for the purpose of display.

Anyway, I’m really into making my own custom distribution based on Crosstool-NG (for cross compiling), u-boot (bootloader), Buildroot/Busybox (for root filesystems) and of course Linux.

Related posts:

See this post for links to other interviews here and a couple people who have interviewed me.

Ubuntu Made Easy

I like books from No Starch Press. (This isn’t some sort of paid endorsement; I don’t make any money from them. They give me books to review, but that’s kinda necessary if I’m going to review them.) Their books are fairly dense with technical content, but they also have a casual style and a sense of humor that makes them easier to read.

The latest book from No Starch is Ubuntu Made Easy: A Project-Based Introduction to Linux and it lives up to the expectations I have of No Starch books. It’s sort of a GUI counterpart to The Linux Command Line from the same publisher.

Ubuntu Made Easy is all about doing common tasks with Ubuntu. It’s primarily aimed at non-technical users, but programmers are often in the same boat as everyone else when they’re managing photos etc. rather than writing code and would find the book handy. It is primarily about Ubuntu specifically rather than Linux in general. In particular, it focuses on the current version, version 12.04, and its Unity user interface.

The book reads like the best books on how to use Windows or Mac, only for Ubuntu. By that I mean it has the level of polish and detail that I’ve more often seen in books written for those operating systems than in books written for Linux. I’d feel good about giving a copy of this book to someone who hasn’t used Linux.

There’s one part of the book that seemed a little out of place: Chapter 8, an introduction to the command line. Since this book is mostly about using the GUI and is aimed at a broad audience, some readers might be intimidated by this. If so, I hope they just skip over Chapter 8 since the rest of the book doesn’t depend much on it.

Related post: Why Food for the Hungry runs Ubuntu

For daily tips on using Unix, follow @UnixToolTip on Twitter.

UnixToolTip twitter icon

Keeping multiple programming languages straight

Someone sent me an email asking how I use multiple programming languages and how I keep them straight. I thought I’d write my answer here in case someone else had the same question.

I’ve learned programming languages out of a mixture of curiosity and necessity, maybe more of the latter. I’d rather use fewer languages, or at least use more languages out of curiosity.

Alan Perlis once said

A language that doesn’t affect the way you think about programming is not worth knowing.

In the spirit of Perlis, I’d say that it’s worthwhile to learn some programming languages just to expand your thinking. Learn a language outside your comfort zone, even if you don’t foresee any practical use for it. Expanding your mind is practical enough.

Another reason is to use multiple languages is to pick languages optimized for different tasks. For example, you might use SQL for interacting with relational databases, C++ for fine-grained control and efficiency, Perl for text munging, etc.

But keeping track of the differences between similar languages is just drudgery. When languages are very similar, there’s little educational or engineering reason to use both. But there may be good business reasons, such as preserving legacy code or serving different clients. Given a choice, I’d rather not use two similar languages, but often I do out of necessity.

So, how do you keep different programming languages straight?

  1. Try not to use similar languages in the same niche. Learn one system programming language, one scripting language, one markup language, etc. If you have to use two languages in a niche, pick one to learn well.
  2. Try to use one language at a time. Load that language’s syntax into your mental RAM and use it as long as you can before switching languages.
  3. Keep notes and save sample code.

Some of the more popular pages on my web site are notes that I wrote for my own reference. Years ago I was confused about all the different kinds of strings I had to use in C++ and wrote these notes. And when I was learning PowerShell I wrote a cookbook of sorts.

Here are some more examples.

Regular expressions in:

Probability distributions in:

I’m glad people have found these notes useful, and I did have in mind that other people would read them, but I mostly wrote them for my own reference.

Avoiding underflow in Bayesian computations

Here’s a common problem that arises in Bayesian computation. Everything works just fine until you have more data than you’ve seen before. Then suddenly you start getting infinite, NaN, or otherwise strange results. This post explains what might be wrong and how to fix it.

A posterior density is (proportional to) a likelihood function times a prior distribution. The likelihood function is a product. The number of data points is the number of terms in the product. If these numbers are less than 1, and you multiply enough of them together, the result will be too small to represent in a floating point number and your calculation will underflow to zero. Then subsequent operations with this number, such as dividing it by another number that has also underflowed to zero, may produce an infinite or NaN result.

The instinctive reaction regarding underflow or overflow is to use logs. And often that works. If you wanted to know where the maximum of the posterior density occurs, you could find the maximum of the logarithm of the posterior density. But in Bayesian computations you often need to integrate the posterior density times some functions. Now you cannot just work with logs because logs and integrals don’t commute.

One way around the problem is to multiply the integrand by a constant so large that there is no danger of underflow. Multiplying by constants does commute with integration.

So suppose your integrand is on the order of 10^-400, far below the smallest representable number. Do you need extended precision arithmetic? No, you just need to understand your problem.

If you multiply your integrand by 10^400 before integrating, then your integrand is roughly 1 in magnitude. Then do you integration, and remember that the result is 10^400 times the actual value.

You could take the following steps.

  1. Find m, the maximum of the log of the integrand.
  2. Let I be the integral of exp( log of the integrand – m ).
  3. Keep track that your actual integral is exp(m) I, or that its log is m + log I.

Note that you can be extremely sloppy in this process. You don’t need an accurate estimate of the maximum of the integrand per se. If you’re within a few dozen orders of magnitude, for example, that could be sufficient to carry out your integration without underflow.

One way to estimate the maximum is to use a frequentist estimator of your parameters as an approximate MLE, and assume that this is approximately where your posterior density takes on its maximum value. This might actually be very accurate, but it doesn’t need to be.

Note also that it’s OK if some of the evaluations of your integrand underflow to zero. You just don’t want the entire integral to underflow to zero. Places where the integrand is many orders of magnitude less than its maximum don’t contribute to the integration anyway. (It’s important that your integration pays attention to the region where the integrand is largest. A naive integration could entirely miss the most important region and completely underestimate the integral. But that’s a matter for another blog post.)

Click to find out more about consulting for statistical computing


Editing by semantic units

The most basic text editor commands operate on lines and characters: move up or down a line, delete the next or previous character, etc. More advanced commands operate on context-specific semantic units. In the context of English prose, this means words, sentences, and paragraphs. In the context of source code, it means blocks and functions, however these are delimited in a particular language. In the context of an outline, it means navigating the tree.

One way to become more proficient at using your editor of choice is to become fluent at working with larger semantic units. Of course this saves keystrokes, though as I argued here, I don’t think typing speed matters so much. However, working in larger semantic units reduces errors. It also improves the chances that you can act on an idea before you forget what you were doing. These factors matter more than typing speed per se.

The general principles above apply to any editor. Here I’ll sketch how these ideas work in Emacs.

As a general rule, commands that operate on physical units begin with Control and analogous commands that work on semantic units being with Meta (Alt). For example, C-f moves forward a character and M-f moves forward a word. C-e moves to the end of a line and M-e moves to the end of a sentence.

Emacs extends the meaning of prose units by analogy in other contexts. For example, in calendar mode, a “line” corresponds to a week, and a “paragraph” corresponds to a month.

Emacs has numerous commands to work on “balanced expressions,” i.e. text surrounded by parentheses, brackets, braces, quotation marks, etc. Often the default keybindings are consistent with those used for other units. For example, “f” means forward and “b” means backward in any context. C-f and C-b move by character, M-f and M-b move by word, and C-M-f and C-M-b move by balanced expression. Emacs also has commands for working with function definitions: selecting a function definition, moving to the beginning or end of a function definition, etc. In Lisp, function definitions are balanced expressions. In other languages, like C, these ideas are distinct.

This morning, Irreal points out an Emacs package expand-region that selects larger and larger semantic regions. The first time it is invoked, it selects the immediate context of the cursor. Each subsequent invocation selects a larger context, moving up the parse tree of the content. I look forwarding to trying it.

* * *

For daily tips on using Unix, follow @UnixToolTip on Twitter.

UnixToolTip twitter icon

Trick for computing log(1+x)

Charles Karney showed me a clever trick for computing log(1+x). It only takes a couple lines of code, but those lines are not obvious. If you understand this trick, you’ll understand a good bit of how floating point arithmetic works.

First of all, what’s the big deal? Why not just add 1 to x and take the log? The direct approach will be inaccurate for small x and completely inaccurate for very small x. For more explanation, see Math library functions that seem unnecessary.

Here’s how Charles Kerney implements the code in C++:

template<typename T> static inline T log1p(T x) throw() {
    volatile T y = 1 + x, z = y - 1;
    return z == 0 ? x : x * std::log(y) / z;

The code comes from GeographicLib, and is based on Theorem 4 of this paper paper. I’ve read that paper several times, but I either didn’t notice the trick above or forgot it.

The code includes comments explaining the trick. I cut out the comments because I’ll explain the code below.

If y = 1+x and z = y-1, doesn’t z just equal x? No, not necessarily! This is exactly the kind of code that a well-meaning but uninformed programmer would go through and “clean up.” If x isn’t too small, no harm done. But if x is very small, it could ruin the code. It’s also the kind of code an optimizing compiler might rearrange, hence the volatile keyword telling optimizers to not be clever.

Imagine x = 1e-20. Then y = 1+x sets y to exactly 1; a standard floating point number does not have enough precision to distinguish 1 and 1 + 10^-20. Then z = y-1 sets z to 0. While z will not exactly equal x in general, it will be a good approximation for x.

If z is not zero, log(y)/z is a good approximation to log(1 + x)/x because y is close to 1+x and z is close to x. Multiplying log(y)/z by x gives a good approximation to log(1+x).

If z is zero, x is so small that x is an excellent approximation to log(1 + x) because the error is on the order of x2.

Giveaway: PowerShell in Depth

About a month ago I gave away a copy of Learn Windows PowerShell 3 in a Month of Lunches that Manning provided. Now I’m giving away a copy of PowerShell in Depth: An administrator’s guide by Don Jones, Richard Siddaway, and Jeffery Hicks.

To enter the drawing, please leave a comment below. I’ll draw a winner from the comments received by the end of the day Monday, July 30 in my time zone.

This book is part of Manning’s early release program. The winner will receive the current version of the book in electronic form and updates as they come out.

Related links:


Many of the things I once thought were trivial I now think are important. That is, I used to think they were trivial in the modern sense of being unimportant. Now I think they’re trivial in the classical sense of being foundational (from trivium, the first stage of a classical education).

In business, “trivial” means “vitally important if you’re actually doing the work, but not important if you’re just watching.”

Related link: Very applied math

80-20 software II

My previous post addressed an objection to apply the 80-20 rule to software. Namely, that even if every user uses only a small portion of the features, they use different portions, and so you can’t remove any of it.

In this post I’ll address a couple more objections. One objection is that if the 80-20 principle holds, you can apply it over and over until you’re left with nothing. That is, if 80% of your users are content with 20% of your features, then 80% of those users (64% of the original user base) will be content with only 20% of the most-used features (4% of the original features). Keep repeating until you’re down to one feature everybody loves.

First of all, it may indeed be true that you could apply the 80-20 rule more than once. Maybe 64% of users really are content with 4% of the features. Just because the rule doesn’t apply an infinite number of times doesn’t mean that you can’t apply it once or twice before it breaks down.

More fundamentally, there’s nothing magical about “80” and “20.” The more general and more realistic principle is simply that often return on effort is very unevenly distributed. Maybe 92% of customers use only 17% of features. Maybe 70% of customers use only 3% of features. The numbers vary.

If you do apply the rule repeatedly, you may get a different distribution each time. Suppose you first limit your attention to the most popular features, whatever percentage cut-off that turns out to be. Then within those functions, some will still be more popular than others. The proportions might not be the same as your first cut, but popularity will still be uneven until you get down to a small core of features that most people use.

But as I explained in the previous post, the time to talk about cutting features is before they are developed and deployed. Once a feature has shipped, it is extremely hard to remove.

Another objection is that it is impossible to predict what features users will want. You can’t know until you ship it. Certainly it’s easier to tell in hindsight what people want, but it’s going too far to say you cannot predict anything. If you really could not predict anything, then all software would be bloated. A great deal of software is bloated, but not all of it. Small, successful programs do exist.

Even if you could literally apply the 80-20 rule several times, big companies might not be content with the resulting market share. Suppose you could apply the 80-20 rule to Microsoft Word four times. Then 41% of customers would be content with 0.16% of the features. (I don’t think this is realistic, but let’s assume it is for the sake of argument.) Microsoft might not be happy with writing off 59% of their potential market up front. But a start-up might be thrilled to give up half their potential market in exchange for only having to develop a tiny fraction of the features.

80-20 software

The 80-20 rule says that often 80% of your results come from 20% of your effort. Applied to software, 80% of your customers may only use 20% of the features. So why not just develop that 20% and let the rest go?

There are numerous objections to this line of reasoning. I’m just going to address one here. Maybe each of your customers uses a small subset of your features, say nobody uses more than 5%. But they all use different subsets of the features. When you add together everybody’s 5% you end up with everything being used. For example, Microsoft Word is huge. I doubt many people use more than 1% of the software. And yet every feature is being used somewhere.

That’s a valid point, but it’s a stronger point after the software is written than before it is written. Once a feature is released, someone is going to use it. And once someone is accustomed to using it, they’re going to want to keep using it.

Suppose your software provides two redundant ways to accomplish some task, method 1 and method 2. Half your users stumble on method 1 first and get comfortable using it. The other latch on to method 2. Now you cannot remove either method without upsetting half your customers. But if you had only shipped method 1, everyone would have used it and been happy.

Removing features is almost impossible. You can never substantially simplify an existing product without the risk of making customers angry. But those same customers might have been happy with a simpler product if that’s what you’d delivered first.

A hidden cost of extra features is that they may need to be supported for years to come.

Update: See follow-up post.

Related posts:

Binomial coefficient trick

Binomial coefficients are simplest to work with when the arguments are non-negative integers, but more general arguments are possible and useful. Concrete Mathematics argues that the most useful case is when the top index is real and the bottom index is an integer, and sticks to that assumption, though both arguments could be real or even complex. More on that here.

Later the book claims

(r - k) {r \choose k} = r {r-1 \choose k}

for all (real) r and gives a proof. Following the proof is an interesting discussion.

But wait a minute. We’ve claimed that the identity holds for all real r, yet the derivation we just gave holds only when r is a positive integer. … Have we been cheating?

No, they have not been cheating. Both sides of the equation are polynomials in r. If two polynomials of degree d agree at d+1 points, they must agree everywhere. But these polynomials agree at an infinite number of points, namely all integers, and so they must be equal.

This is a common trick when working with binomial coefficients. It lets us use combinatorial arguments to prove theorems that extend to cases where the binomial coefficients do not have a combinatorial interpretation. But it’s also more generally useful. Often an equation amounts to saying two polynomials are equal, though we may not think of the terms as polynomials. But if we recognize that they are polynomials, we need only prove equality at a finite number of points to establish equality everywhere.

A similar technique is common in complex variables. You often prove an identity assuming real variables, then get the complex version for free. For example, every trig identity you saw in high school remains valid when the arguments are complex numbers. Why? Because analytic functions are, roughly speaking, polynomials of infinite degree (i.e. they have a convergent power series). If two analytic functions agree on an infinite set of values with a limit point (such as the real line) then they agree everywhere.

Related posts:

Deniers, skeptics, and mavericks

Suppose a scientist holds a minority opinion. There’s a trend in journalism to call him a denier if you think he’s wrong, a skeptic if you don’t care, and a maverick if you think he may be right. If this had been the norm in Einstein’s day, he might have been called a Newton-denier.

“Denier” is an ugly word. It implies that someone has no rational basis for his beliefs. He’s either an apologist for evil, as in a Holocaust denier, or mentally disturbed, as in someone in psychological denial. The term “denier” is inflammatory and has no place in scientific discussion.