Why computers were invented

From a lecture by Gregory Chaitin:

We all know that the computer is a very practical thing out there in the real world … But what people don’t remember as much is that really — I’m going to exaggerate, but I’ll say it — the computer was invented in order to help to clarify … a philosophical question about the foundations of mathematics.

Interview with Michael Hammel, author of The Artist’s Guide to GIMP

Michael J. Hammel is the author of The Artist’s Guide to GIMP, a book I reviewed three weeks ago. The following interview is based on my correspondence with Michael.

 

JC: The bio on your book says that you are an embedded software developer. Is GIMP something you use for fun or is it related to your work? Was it related to your work before?

MH: I use it once in awhile at work but not often. GIMP started as something related to what I worked on when it first came out (graphics software for Unix systems). But I’ve never been paid to be an artist. At one point I looked at trying to use what I knew about it to leverage my way into Pixar or Dreamworks. But that never panned out and my interest in that kind of animation has peeked and ebbed, mostly because I decided I didn’t want to live in California (too expensive).

JC: How did you become involved in the development of GIMP?

MH: When GIMP was born, back around 1995 or so, I was working for a company that provided commercial X servers for Unix systems (Xi Graphics, started by Thomas Roell who wrote the original reference implementation of an X server for 386 systems, aka X386, which begot XFree86 which begot X.org). I packaged Motif as part of my work. A guy I worked with noticed GIMP’s Motif version and pointed me at it. I wrote a plugin for John Beale’s sparkle code (the original Sparkle plugin). I’ve been working with GIMP ever since.

JC: Are you still a developer for GIMP or primarily a user?

MH: A user. I haven’t provided any useful code development to the project in quite sometime.

JC: What’s it like to be a developer on GIMP?

MH: Take this with a grain of salt. It’s a follower’s view from the outside, without being involved with the developers directly.

GIMP’s developer community is somewhat different than other open source communities that I’ve been involved with. GIMP’s leadership is de facto, but not necessarily structured. Many other projects have defined leaders with defined tasks. GIMP’s development is a little more oriented towards scratching your own itch: pick something that that you want to improve and start working on it. The only tough part (at least for some) is understanding and adhering to the main developers coding styles and rules, which are not (to my knowledge) written down.

My only complaint with this style is it tends to allow long term development cycles. End users can’t rely on when new features might be available. While the developers want do things “the right way”, that means that users have been waiting a very long time, for example, for deep paint support. There is a tradeoff for how you do development. Remember that the developers are volunteers, so there is no valid reason for complaining if you, as a user, aren’t getting what you want when you want it. If it bothers you, step in and help out. Otherwise, do as I do: be content with their development process.

JC: What advice would you give people who want to learn image editing?

MH: I learned how to use GIMP by reading Photoshop texts. Image editing is a common set of processes. Think about what you want to do if you weren’t using software. You’re house needs a new color? You mask off the window boarders and roof line and start painting. Same thing in image editing. Only difference: it’s easier to try different colors. Image editing is all about selections and masks. I’m not talking about creating the next Marvel comic, I mean editing existing images. Once you identify what needs to be changed using a selection and mask, you can do all kinds of things: copy and distort, recolor, refocus, etc.

Making comics or custom drawings requires real artistic talent to create a rough design on paper, then import (via scanner or digital camera) into GIMP for cleanup and coloring.

Personally, I prefer taking photos and mixing them together to create a scene that otherwise wouldn’t exist. There is a small example of this in the book. A photo of a city skyline as it exists and a munged version I created to make the city look decayed and falling apart. Beats me what people think of that image, but the creative process that went into it was very relaxing.

JC: What advice would you give people who want to learn GIMP in particular?

MH: Buy my book. Seriously. You learn both at the same time: what is image editing and how do you do it with GIMP. GIMP is just like any other desktop software. You learn where the menus are and what the various windows provide. The rest of your time is learning process — creating a workflow to produce a specific image type or style.

JC: What other software do you use/recommend for creating graphics?

MH: I use simple viewers like Geeqie for browsing images, but if you’re into photography you’ll nee something like f-spot. Photographers, who are nuts about image quality, should learn ufraw. Those interested in 3D or animation should take a look at Blender.

JC: Do you want to say anything about your work with embedded development?

MH: It’s my primary focus in life these days. I’m working on creating a custom media-focused Linux distribution (which I call BeagleBox) for the BeagleBoard C4 (and later the xm). I’d like to get my hands on the Raspberry Pi but wasn’t able to place an order when they first went on sale.

Embedded work is the future for Linux. The desktop for non-techies might change to tablets or such, but Linux will be in everything: your fridge, your TV, your car, your electrical system, your water and sewage systems, your phone, you’re neighbors who are flying robotic UAVs to peek in your windows. Everything.

My day job is building a radar system for UAVs (the big ones) that use embedded Linux to run out of flash. But the direction of interest is smaller devices, little boards with processors and memory in one, like the Pi or BeagleBoard, that can sense the environment and communicate to the rest of the household, with one of those devices used to display all the info on a big wall, removing the need for specialized devices just for the purpose of display.

Anyway, I’m really into making my own custom distribution based on Crosstool-NG (for cross compiling), u-boot (bootloader), Buildroot/Busybox (for root filesystems) and of course Linux.

Ubuntu Made Easy

I like books from No Starch Press. (This isn’t some sort of paid endorsement; I don’t make any money from them. They give me books to review, but that’s kinda necessary if I’m going to review them.) Their books are fairly dense with technical content, but they also have a casual style and a sense of humor that makes them easier to read.

The latest book from No Starch is Ubuntu Made Easy: A Project-Based Introduction to Linux and it lives up to the expectations I have of No Starch books. It’s sort of a GUI counterpart to The Linux Command Line from the same publisher.

Ubuntu Made Easy is all about doing common tasks with Ubuntu. It’s primarily aimed at non-technical users, but programmers are often in the same boat as everyone else when they’re managing photos etc. rather than writing code and would find the book handy. It is primarily about Ubuntu specifically rather than Linux in general. In particular, it focuses on the current version, version 12.04, and its Unity user interface.

The book reads like the best books on how to use Windows or Mac, only for Ubuntu. By that I mean it has the level of polish and detail that I’ve more often seen in books written for those operating systems than in books written for Linux. I’d feel good about giving a copy of this book to someone who hasn’t used Linux.

There’s one part of the book that seemed a little out of place: Chapter 8, an introduction to the command line. Since this book is mostly about using the GUI and is aimed at a broad audience, some readers might be intimidated by this. If so, I hope they just skip over Chapter 8 since the rest of the book doesn’t depend much on it.

Related post: Why Food for the Hungry runs Ubuntu

 

Keeping multiple programming languages straight

Someone sent me an email asking how I use multiple programming languages and how I keep them straight. I thought I’d write my answer here in case someone else had the same question.

I’ve learned programming languages out of a mixture of curiosity and necessity, maybe more of the latter. I’d rather use fewer languages, or at least use more languages out of curiosity.

Alan Perlis once said

A language that doesn’t affect the way you think about programming is not worth knowing.

In the spirit of Perlis, I’d say that it’s worthwhile to learn some programming languages just to expand your thinking. Learn a language outside your comfort zone, even if you don’t foresee any practical use for it. Expanding your mind is practical enough.

Another reason is to use multiple languages is to pick languages optimized for different tasks. For example, you might use SQL for interacting with relational databases, C++ for fine-grained control and efficiency, Perl for text munging, etc.

But keeping track of the differences between similar languages is just drudgery. When languages are very similar, there’s little educational or engineering reason to use both. But there may be good business reasons, such as preserving legacy code or serving different clients. Given a choice, I’d rather not use two similar languages, but often I do out of necessity.

So, how do you keep different programming languages straight?

  1. Try not to use similar languages in the same niche. Learn one system programming language, one scripting language, one markup language, etc. If you have to use two languages in a niche, pick one to learn well.
  2. Try to use one language at a time. Load that language’s syntax into your mental RAM and use it as long as you can before switching languages.
  3. Keep notes and save sample code.

Some of the more popular pages on my website are notes that I wrote for my own reference. Years ago I was confused about all the different kinds of strings I had to use in C++ and wrote these notes. And when I was learning PowerShell I wrote a cookbook of sorts.

Here are some more examples.

Regular expressions in:

Probability distributions in:

I’m glad people have found these notes useful, and I did have in mind that other people would read them, but I mostly wrote them for my own reference.

Avoiding underflow in Bayesian computations

Here’s a common problem that arises in Bayesian computation. Everything works just fine until you have more data than you’ve seen before. Then suddenly you start getting infinite, NaN, or otherwise strange results. This post explains what might be wrong and how to fix it.

A posterior density is (proportional to) a likelihood function times a prior distribution. The likelihood function is a product. The number of data points is the number of terms in the product. If these numbers are less than 1, and you multiply enough of them together, the result will be too small to represent in a floating point number and your calculation will underflow to zero. Then subsequent operations with this number, such as dividing it by another number that has also underflowed to zero, may produce an infinite or NaN result.

The instinctive reaction regarding underflow or overflow is to use logs. And often that works. If you wanted to know where the maximum of the posterior density occurs, you could find the maximum of the logarithm of the posterior density. But in Bayesian computations you often need to integrate the posterior density times some functions. Now you cannot just work with logs because logs and integrals don’t commute.

One way around the problem is to multiply the integrand by a constant so large that there is no danger of underflow. Multiplying by constants does commute with integration.

So suppose your integrand is on the order of 10^-400, far below the smallest representable number. Do you need extended precision arithmetic? No, you just need to understand your problem.

If you multiply your integrand by 10^400 before integrating, then your integrand is roughly 1 in magnitude. Then do you integration, and remember that the result is 10^400 times the actual value.

You could take the following steps.

  1. Find m, the maximum of the log of the integrand.
  2. Let I be the integral of exp( log of the integrand – m ).
  3. Keep track that your actual integral is exp(m) I, or that its log is m + log I.

Note that you can be extremely sloppy in this process. You don’t need an accurate estimate of the maximum of the integrand per se. If you’re within a few dozen orders of magnitude, for example, that could be sufficient to carry out your integration without underflow.

One way to estimate the maximum is to use a frequentist estimator of your parameters as an approximate MLE, and assume that this is approximately where your posterior density takes on its maximum value. This might actually be very accurate, but it doesn’t need to be.

Note also that it’s OK if some of the evaluations of your integrand underflow to zero. You just don’t want the entire integral to underflow to zero. Places where the integrand is many orders of magnitude less than its maximum don’t contribute to the integration anyway. (It’s important that your integration pays attention to the region where the integrand is largest. A naive integration could entirely miss the most important region and completely underestimate the integral. But that’s a matter for another blog post.)

Editing by semantic units

The most basic text editor commands operate on lines and characters: move up or down a line, delete the next or previous character, etc. More advanced commands operate on context-specific semantic units. In the context of English prose, this means words, sentences, and paragraphs. In the context of source code, it means blocks and functions, however these are delimited in a particular language. In the context of an outline, it means navigating the tree.

One way to become more proficient at using your editor of choice is to become fluent at working with larger semantic units. Of course this saves keystrokes, though as I argued here, I don’t think typing speed matters so much. However, working in larger semantic units reduces errors. It also improves the chances that you can act on an idea before you forget what you were doing. These factors matter more than typing speed per se.

The general principles above apply to any editor. Here I’ll sketch how these ideas work in Emacs.

As a general rule, commands that operate on physical units begin with Control and analogous commands that work on semantic units being with Meta (Alt). For example, C-f moves forward a character and M-f moves forward a word. C-e moves to the end of a line and M-e moves to the end of a sentence.

Emacs extends the meaning of prose units by analogy in other contexts. For example, in calendar mode, a “line” corresponds to a week, and a “paragraph” corresponds to a month.

Emacs has numerous commands to work on “balanced expressions,” i.e. text surrounded by parentheses, brackets, braces, quotation marks, etc. Often the default keybindings are consistent with those used for other units. For example, “f” means forward and “b” means backward in any context. C-f and C-b move by character, M-f and M-b move by word, and C-M-f and C-M-b move by balanced expression. Emacs also has commands for working with function definitions: selecting a function definition, moving to the beginning or end of a function definition, etc. In Lisp, function definitions are balanced expressions. In other languages, like C, these ideas are distinct.

This morning, Irreal points out an Emacs package expand-region that selects larger and larger semantic regions. The first time it is invoked, it selects the immediate context of the cursor. Each subsequent invocation selects a larger context, moving up the parse tree of the content. I look forwarding to trying it.

Trick for computing log(1+x)

Charles Karney showed me a clever trick for computing log(1+x). It only takes a couple lines of code, but those lines are not obvious. If you understand this trick, you’ll understand a good bit of how floating point arithmetic works.

First of all, what’s the big deal? Why not just add 1 to x and take the log? The direct approach will be inaccurate for small x and completely inaccurate for very small x. For more explanation, see Math library functions that seem unnecessary.

Here’s how Charles Kerney implements the code in C++:

template<typename T> static inline T log1p(T x) throw() {
    volatile T y = 1 + x, z = y - 1;
    return z == 0 ? x : x * std::log(y) / z;
}

The code comes from GeographicLib, and is based on Theorem 4 of this paper paper. I’ve read that paper several times, but I either didn’t notice the trick above or forgot it.

The code includes comments explaining the trick. I cut out the comments because I’ll explain the code below.

If y = 1+x and z = y-1, doesn’t z just equal x? No, not necessarily! This is exactly the kind of code that a well-meaning but uninformed programmer would go through and “clean up.” If x isn’t too small, no harm done. But if x is very small, it could ruin the code. It’s also the kind of code an optimizing compiler might rearrange, hence the volatile keyword telling optimizers to not be clever.

Imagine x = 1e-20. Then y = 1+x sets y to exactly 1; a standard floating point number does not have enough precision to distinguish 1 and 1 + 10^-20. Then z = y-1 sets z to 0. While z will not exactly equal x in general, it will be a good approximation for x.

If z is not zero, log(y)/z is a good approximation to log(1 + x)/x because y is close to 1+x and z is close to x. Multiplying log(y)/z by x gives a good approximation to log(1+x).

If z is zero, x is so small that x is an excellent approximation to log(1 + x) because the error is on the order of x2.

Giveaway: PowerShell in Depth

About a month ago I gave away a copy of Learn Windows PowerShell 3 in a Month of Lunches that Manning provided. Now I’m giving away a copy of PowerShell in Depth: An administrator’s guide by Don Jones, Richard Siddaway, and Jeffery Hicks.

To enter the drawing, please leave a comment below. I’ll draw a winner from the comments received by the end of the day Monday, July 30 in my time zone.

This book is part of Manning’s early release program. The winner will receive the current version of the book in electronic form and updates as they come out.

Related links