Twenty weeks down to twenty minutes

by John on January 26, 2012

I had a simulation this week that I estimated would take 20 weeks to run. This estimate was based on too small a run. A more reliable estimate based on a longer run was 300 CPU hours, about two weeks running on a single processor. Before I split the problem up to run in parallel, I wanted to see if I could make each thread run faster.

The most expensive part of the simulation is a function that takes in four integer parameters and does a numerical integration. Each parameter could be any value from 0 to 120. So in principle there are 121^4 = 214,358,881 possible parameter combinations. I knew that some of these combinations were extremely unlikely or impossible, so I thought I could reduce the execution time by caching the most commonly called parameters. Maybe a few thousand arguments would account for most of the function calls.

I saved the function arguments and computed values in map, a.k.a. dictionary, associative array, etc. (The buzz word for this is memoization.) I set a maximum size on the map because I thought it could grow very large. Even if, say, a thousand arguments accounted for most of the function calls, maybe there were many more that only appeared once.

It turned out that the function was only called with 408 distinct parameter combinations. That means the same integrals were being evaluated millions of times. Caching these values reduced the execution time by a factor of about 800. In other words, the software could retrieve a previously computed value 800x faster than compute it from scratch.

The simulation that would have taken 300 hours took only 22 minutes, and so there was no need to make it run in parallel.

{ 4 comments }

The most brutal man page

by John on January 25, 2012

In The Linux Command Line, the author describes the bash man page* as “the most brutal man page of them all.”

Many man pages are hard to read, but I think that the grand prize for difficulty has to go to the man page for bash. As I was doing my research for this book, I gave it a careful review to ensure that I was covering most of its topics. When printed, it’s over 80 pages long and extremely dense, and its structure makes absolutely no sense to a new user.

On the other hand, it is very accurate and concise, as well as being extremely complete. So check it out if you dare, and look forward to the day when you can read it and it all makes sense.

* If you’re not familiar with Unix lingo, “man” stands for “manual”. Man pages are online documentation.

Related post:

Review of The Linux Command Line

{ 10 comments }

Review: The Linux Command Line

by John on January 25, 2012

No Starch Press recently released The Linux Command Line: A Complete Introduction by William E. Shotts, Jr.

True to its name, the book is about using Linux from command line. It’s not an encyclopedia of Linux. It doesn’t explain how to install Linux, doesn’t go into system APIs, and says little about how to administer Linux. At the same time, the book is broader than just a book on bash. It’s about how to “live” at the command line.

The introduction explains the intended audience.

This book is for Linux users who have migrated from other platforms. Most likely you are a “power user” of some version of Microsoft Windows.

The book has a conversational style, explaining the motivation behind ways of working as well as providing technical detail. It includes small but very useful suggestions along the way, the kinds of tips you’d pick up from a friend but might not find in a book.

The book has four parts

  1. Learning the shell
  2. Configuration and the environment
  3. Common tasks and essential tools
  4. Writing shell scripts

The book could have just included the first three sections; the forth part is a bit more specialized than the others. If you’d prefer, think of the book has having three parts, plus a lengthy appendix on shell scripting.

The Linux Command Line is pleasant to read. It has a light tone, while also getting down to business.

Related post:

Perverse hipster desire for retro-computing

{ 8 comments }

Color-coded surgery

by John on January 24, 2012

This is the most encouraging thing I’ve seen in cancer research in some time: a way to make tumors fluoresce. This allows surgeons to see tumor boundaries.

From TED

{ 4 comments }

Boundary conditions are the hard part

by John on January 24, 2012

What we call “differential equations” are usually not just differential equations. They also have associated initial conditions or boundary conditions.

With ordinary differential equations (ODEs), the initial conditions are often an afterthought. First you find a full set of solutions, then you plug in initial conditions to get a specific solution.

Partial differential equations (PDEs) have boundary conditions (and maybe initial conditions too). Since people typically learn ODEs first, they come to PDEs expecting boundary values to play a role analogous to ODEs. In a very limited sense they do, but in general boundary values are quite different.

The hard part about PDEs is not the PDEs themselves; the hard part is the boundary conditions. Finding solutions to differential equations in the interior of a domain is easy compared to making the equations have the specified behavior on the boundary.

No model can take everything into account. You have to draw some box around that part of the world that you’re going to model and specify what happens when your imaginary box meets the rest of the universe. That’s the hard part.

Related posts:

Three views of differential equations
Nonlinear is not a hypothesis
Approximating a solution that does not exist

{ 3 comments }

What do colleges sell?

by John on January 24, 2012

Universities are starting to give away their content online, while they still charge tens of thousands of dollars a year to attend. Just what are they selling? Credentials, accountability, and feedback.

Some people are asking why go to college when you can download a college education via iTunes U.

First, you would have no credentials when you’re done.

Second, you almost certainly would not put in the same amount of work as a college student without someone to pace you through the material and to provide external motivation. You’d be less likely to struggle through anything you found difficult or uninteresting.

Third, you’d have no feedback to know whether you’re really learning what you think you’re learning.

The people that I hear gush about online education opportunities are well-educated, successful, and ambitious. They may be less concerned about credentials either because they are intrinsically motivated or because they already have enough credentials. And because of their ambition, they need less accountability. They may need less feedback or are resourceful enough to seek out alternative channels for feedback, such as online forums. Resources such iTunes U and The Teaching Company are a godsend to such people. But that doesn’t mean that a typical teenager would make as much of the same opportunities.

{ 24 comments }

Grokking electricity

by John on January 23, 2012

After I finished an electromagnetism course in college, I said that one day I’d go back and really understand the subject. Now I’m starting to do that. I want to understand theory and practical applications, from Maxwell’s equations to Radio Shack.

I’m starting by reading the Feynman lectures on E&M. After that I plan to read something on electronics. If you have resources you recommend, please let me know.

I’ve started new Twitter account, @GrokEM. I figure that tweeting about E&M will help me stick to my goal. My other Twitter accounts post on a regular schedule (plus a few extras) and are scheduled weeks in advance. GrokEM will be more erratic, at least for now. (In case you’re not familiar with grok, it’s a slang for knowing something thoroughly and intuitively.)

Here’s what Feynman said about mathematicians learning physics, particularly E&M.

Mathematicians, or people who have very mathematical minds, are often led astray when “studying” physics because they loose sight of the physics. They say: “Look, these differential equations — the Maxwell equations — are all there is to electrodynamics … if I understand them mathematically inside out, I will understand the physics inside out.” Only it doesn’t work that way. … They fail because the actual physical situations in the real world are so complicated that it is necessary to have a much broader understanding of the equations. … A physical understanding is a completely unmathematical, imprecise, and inexact thing, but absolutely necessary for a physicist.

Heinlein coined grok around the same time that Feynman made the above remarks. Otherwise, Feynman might have said that only studying differential equations is not the way to grok electrodynamics.

Related posts:

What it means to understand an equation
Most important event of the 19th century

{ 15 comments }

Educational monoculture

by John on January 22, 2012

I ran across the term “educational monoculture” this weekend. What a great phrase!

Rather than write a long post, I’ll restrain myself and simply say that I’d like to hear more people talk about “educational monoculture.”

Related post:

Don’t standardize education, personalize it

{ 9 comments }

Oscillating Fibonacci ratios

by John on January 20, 2012

You may know that ratios of consecutive Fibonacci numbers tend to the golden ratio in the limit. But do know how they tend to the limit? The ratio oscillates, one above the golden ratio, the next below, each getting closer. The plot shows F(n+1) / F(n) where F(n) is the nth Fibonacci number. The height of the horizontal line is the golden ratio.

plot

We can prove that the ratio oscillates by starting with the formula

F(n) = \frac{ \phi^n - (-1)^n \phi^{-n} }{ \sqrt{5} }

where φ = (1 + √5)/2 is the golden ratio.

From there we can work out that

\frac{F(n+1)}{F(n)} - \phi = (-1)^n \frac{ \phi + \phi^{-1}}{\phi^{2n} + (-1)^{n+1}}

This shows that when n is odd, F(n+1) / F(n) is below the golden ratio and when n is even it is above. It also shows that the absolute error in approximating the golden ratio by F(n+1) / F(n) goes down by a factor of about φ2 each time n increases by 1.

Related posts:

Honeybee genealogy
Fibonacci numbers at work
Breastfeeding and the golden ratio

{ 6 comments }

Rushing to see numbers

by John on January 19, 2012

Scientific programmers and algebra students start out with analogous bad habits.

Beginning algebra students rush to enter numbers into a calculator. They find it comforting to reduce expressions to floating point numbers as frequently as possible. This is understandable, but it’s a habit they need to break for numerous reasons: it’s more work, harder for a grader to follow, less accurate, etc. Better to do as much calculation as possible with symbols, then stick in numbers at the end.

A similar phenomena happens in scientific programming. We’re anxious to see numbers, so we print out numbers as soon as we produce them. There’s a tendency to sprinkle code with printf statements, then write scripts to scrape text output to gather results.

It’s better to keep numbers in data structures as long as possible, then dump the data to text as the last step. Why? For one thing, the output format might change: instead of a text dump, maybe you’ll want to put the data in a database or format it as an HTML table. More importantly, the “last step” will often change. What was the last step now becomes the next-to-last step because you think of something else to do. So you do more with the data in memory before producing output, rather than scraping the text output.

I quickly learned to delay plugging numbers into algebraic expressions. It took me longer to learn not to output numeric results until the last moment.

{ 13 comments }

Funny and serious

by John on January 19, 2012

G. K. Chesterton on being funny and being serious:

Mr. McCabe thinks that I am not serious but only funny, because Mr. McCabe thinks that funny is the opposite of serious. Funny is the opposite of not funny, and of nothing else. … Whether a man chooses to tell the truth in long sentences or short jokes is a problem analogous to whether he chooses to tell the truth in French or German. Whether a man preaches his gospel grotesquely or gravely is merely like the question of whether he preaches it in prose or verse. … The truth is, as I have said, that in this sense the two qualities of fun and seriousness have nothing whatever to do with each other, they are no more comparable than black and triangular.

Emphasis added. From Heretics. Text available online from Project Gutenberg.

{ 1 comment }

Walking away from factory work

by John on January 17, 2012

From Shop Class as Soulcraft,

Given their likely acquaintance such a cognitively rich world of work, it is hardly surprising that when Henry Ford introduced the assembly line in 1913, workers simply walked out. One of Ford’s biographers wrote, “So great was labor’s distaste for the new machine system that toward the close of 1913 every time the company wanted to add 100 men to its factory personnel, it was necessary to hire 963.”

A dozen years ago people would talk of building “software factories” to crank out software projects. Back then someone tried to get me excited about joining an effort to create such a factory. I told him I did not want to work in a factory. He tried to back-peddle, saying that it’s not what it sounds like. But I’m sure it was exactly what it sounded like.

{ 5 comments }

Preparing for change, expressing intent

by John on January 17, 2012

Many good programming practices boil down to preparing for change or expressing intent. It seems to me that novices emphasize the former, experts the latter.

One of the first things you learn in programming is to use symbolic constants rather than magic numbers. For example, if you have a maximum of 12 items in a shopping cart, define a constant like MAX_ITEMS to be 12 and use that symbol rather than the number “12″ throughout the code. That way if you have to increase the maximum to 25 some day, you can just make the change in one place. Symbolic constants prepare for change.

Sounds good, but then why define a constant for pi? It’s not going to change. But having a constant PI in source code conveys the intention of the number.

There are 3,628,800 seconds in six weeks. Coincidentally, this number also equals 10!. But constants like SECONDS_PER_SIX_WEEKS and TEN_FACTORIAL clearly convey where the numbers come from. That’s why it’s sometimes worthwhile to give one thing two names. The symbol SECONDS_PER_SIX_WEEKS looks like a conversion factor, while TEN_FACTORIAL makes you think somewhere there are 10 things being arranged. Using the symbols in the opposite context would be clever, but not in a good way.

Expressing intent is easier to justify than preparing for change. If you argue that some chunk of code should be pulled out into its own function in case it needs to change, someone may argue “But that’ll never change.” If you argue that the same chuck of code should be pulled out and given a name to express what it’s trying to do, you’re likely to get less resistance.

If you focus on making your intentions clear, your code will be easier to maintain. If you focus on maintainability alone, it might backfire. You might get lots of unneeded code, inserted with the intent of making future maintenance easier, that makes maintenance harder.

Related posts:

Why does software have to be maintained?
Holographic code
Bugs, features, and risk

{ 4 comments }

Six analysis and probability diagrams

by John on January 17, 2012

Here are a few diagrams I’ve created that summarize relationships in analysis and probability. Click on a thumbnail image to go to a page with the full image and explanatory text.

Special functions

Gamma and related functions

Probability distributions

Conjugate priors

Convergence theorems

Bessel functions

{ 3 comments }

The most dreadful conclusion

by John on January 16, 2012

In his book Heretics, G. K. Chesterton praises H. G. Wells for being able to change his mind.

He has abandoned the sensational theory with the same honourable gravity and simplicity with which he adopted it. Then he thought it was true; now he thinks it is not true. He has come to the most dreadful conclusion a literary man can come to, the conclusion that the ordinary view is the right one. It is only the last and wildest kind of courage that can stand on a tower before ten thousand people and tell them that twice two is four.

Emphasis added.

Related post: Three reasons expert predictions are often wrong

{ 3 comments }