Russian novel programming

Posted on 27 September 2012 by John

One of the things that makes Russian novels hard to read, at least for Americans, is that characters have multiple names. For example, in The Brothers Karamazov, Alexei Fyodorovich Karamazov is also called Alyosha, Alyoshka, Alyoshenka, Alyoshechka, Alexeichik, Lyosha, and Lyoshenka.

Russian novel programming is the anti-pattern of one thing having many names. For a given program, you may have a location in version control, a location on your hard drive, a project name, a name for the program executable, etc. Each of these may contain slight differences in the same name. Or major differences. For historical reasons, the code for foo.exe is in a project named ‘bar’, under a path named …

I thought about this today when looking into a question about a program. A single number had different names in several different contexts. There’s the text label on the desktop user interface, the name of the C# variable that captures the user input, the name of the corresponding C++ variable when the user input is passed to the back-end numeric code, and the name of used in the XML file that serializes the variable when it goes between a database and a web server. Of course these should all be coordinated, but there were understandable historical reasons for how things got into this state.

The black T-shirt crowd

Posted on 26 September 2012 by John

My previous post quotes Greg Jorgensen’s imaginary interview with Linus Torvalds. In the interview, Torvalds says “the black T-shirt crowd” has gotten bored with Linux because it has become too easy. Now Git gives them a new arcane product to explore and master. Jorgensen has Torvalds say

I didn’t really expect anyone to use [Git] because it’s so hard to use, but that turns out to be its big appeal. No technology can ever be too arcane or complicated for the black t-shirt crowd.

John Durden pointed out in the comments that I’m wearing a black T-shirt in the photo on my blog. Touché!

The T-shirt I was wearing at the time of the photo isn’t entirely black, though the portion showing in my mugshot is. Here’s the full photo:

I’ll admit to wearing a metaphorical black T-shirt, i.e. enjoying solving arcane problems, though not the particular problems mentioned in Jorgensen’s satire. I take no pleasure in troubleshooting operating system or version control problems, for example. I want such things to just work. But I do enjoy solving some kinds of problems that most people would rather not think about. It would be the pot calling the kettle black for me to poke too much fun at the programmers in Jorgensen’s article. Anyone with a PhD in partial differential equations should be cautious about calling someone else geeky.

(There should be a joke in there about black kettles and black T-shirts, but it’s late as I write this and I can’t pull it off.)

No technology can ever be too arcane

Posted on 26 September 2012 by John

In this fake interview, Linux creator Linus Torvalds says Linux has gotten too easy to use and that’s why people use Git:

Git has taken over where Linux left off separating the geeks into know-nothings and know-it-alls. I didn’t really expect anyone to use it because it’s so hard to use, but that turns out to be its big appeal. No technology can ever be too arcane or complicated for the black t-shirt crowd.

Emphasis added.

Note: If you want to leave a comment saying Linux or Git really aren’t hard to use, lighten up. This is satire.

Related post: Advanced or just obscure?

Ramanujan’s factorial approximation

Posted on 25 September 2012 by John

Ramanujan came up with an approximation for factorial that resembles Stirling’s famous approximation but is much more accurate.

$n! \sim \sqrt{\pi} \left(\frac{n}{e}\right)^n \sqrt[6]{8n^3 + 4n^2 + n + \frac{1}{30}}$

As with Stirling’s approximation, the relative error in Ramanujan’s approximation decreases as n gets larger. Typically these approximations are not useful for small values of n. For n = 5, Stirling’s approximation gives 118.02 while the exact value is 120. But Ramanujan’s approximation gives 120.00015.

Here’s an implementation of the approximation in Python.

    def ramanujan(x):
        fact = sqrt(pi)*(x/e)**x
        fact *= (((8*x + 4)*x + 1)*x + 1/30.)**(1./6.)
        return fact

For non-integer values of x, the function returns an approximation for Γ(x+1), an extension of factorial to real values. Here’s a plot of the accuracy of Ramanujan’s approximation.

plot of precision of Ramanujan's approximation

For x = 50, Ramanujan’s approximation is good to nearly 10 significant figures, whereas Stirling’s approximation is good to about 7.

Here’s a little trickier implementation.

    def ramanujan2(x):
        fact = sqrt(pi)*(x/e)**x
        fact *= (((8*x + 4)*x + 1)*x + 1/30.)**(1./6.)
        if isinstance(x, int):
             fact = int(fact)
        return fact

This code gives the same value as before if x is not an integer. But it gives exact values of factorial for x = 0, 1, 2, … 10. If x is 11 or greater, the result is not exact, but the relative error is less than 10^-7 and decreases as x increases. Ramanujan’s approximation always errs on the high side, so rounding the result down improves the accuracy and makes it exact for small integer inputs.

The downside of this trick is that now, for example, ramanujan2(5) and ramanujan2(5.0) give different results. In some contexts, the improved accuracy may not be worth the inconsistency.

Reference: On Gosper’s formula for the Gamma function

Bad UI of the day

Posted on 24 September 2012 by John

A friend of mine sent me the photo below of a Sears Craftsman reciprocating saw.

saw

What do you suppose the yellow switch does? Since the positions are labeled ‘0’ and ‘1’, my first thought was that they were off and on respectively. But no!

The yellow switch controls blade operation: normal versus orbital. But which label corresponds to which operation? You can’t tell before you turn on the saw, and even while it’s running it’s not obvious. But you can tell if you watch the blade closely as it slows down after you turn the saw off.

The printed instructions that came with the saw say:

The first position is for normal blade operation, and the second position is for orbital blade operation.

But which is “first”, the position labeled ‘0’ or ‘1’? A normal human being might think that 1 and 1st go together. Being a programmer, I assumed—correctly—that 0 is the 1st position.

How many bullets does it take to cut down a tree?

Posted on 23 September 2012 by John

According to Guesstimation 2.0, it would take about 10,000 bullets to cut down a tree with a 20 cm radius. The book doesn’t just announce the result but shows how you might come to this conclusion by computing how much energy it would take to fell the tree and how much energy bullets deliver. After going through a rough calculation, the book reports that Mythbusters actually cut down a tree, a smaller one, with about 2,000 bullets.

Guesstimation 2.0 explores about 100 questions with back-of-the-envelope solutions. (I started to add up the number of problems by looking at the table of contents, but it would be more in line with the spirit of the book and say that since it has on the order to 10 questions and on the order of 10 problems per chapter, it has about 100 questions.)

Another question I found interesting was estimating the amount of fuel needed to transport food across the United States. If you lived in New York and all of your food came from California, it would take about 10 gallons of fuel to bring you your groceries. It may take more fuel to bring your groceries home from the grocery store than it takes to bring the food from around the country to the grocery store.

It’s not the text editor, it’s text

Posted on 22 September 2012 by John

Vivek Haldar had a nice rant about editors a couple days ago. In response to complaints that some editors are ugly, he writes:

The primary factor in looking good should be the choice of a good font at a comfortable size, and a syntax coloring theme that you like. And that is not something specific to an editor. Editors like Emacs and vi have almost no UI!

To illustrate his point, here’s what my Emacs looks like without text:

There’s just not much there, not enough to say it’s pretty or ugly.

When people say that Emacs is not pretty, I think they mean that plain text is not pretty.

For better and for worse, everything in Emacs is text. The advantage of this approach is consistency. Everything uses the same commands for navigation and editing: source code, error messages, directory listings, … Everything is just text. The disadvantage is that you don’t have nicely designed special windows for each of these things, and it does get a little monotonous.

When people say they love their text editor, I think they love text. They prefer an environment that allows them to solve problems by editing text files rather than clicking buttons. And as Vivek says, that is not something specific to a particular editor.

Math tools for the next 20 years

Posted on 21 September 2012 by John

Igor Carron commented on his blog that

… the mathematical tools that we will use in the next 20 years are for the most part probably in our hands already.

He compares this to progress in treating leukemia: survival rates increased dramatically over the last 40 years, not primarily by developing new drugs, but by better applying drugs that were available in 1970.

I find that plausible. I’ve gotten a lot of mileage out of math that was well known 100 years ago.

Related post: Doing good work with bad tools

Cancer moon shots

Posted on 21 September 2012 by John

M. D. Anderson Cancer Center announced a $3 billion research program today aimed at six specific forms of cancer.

Acute myeloid leukemia and myelodysplastic syndrome (AML and MDS)
Chronic lymphocytic leukemia (CLL)
Lung cancer
Melanoma
Prostate cancer
Triple negative breast and ovarian cancer

These special areas of research are being called “moon shots” by analogy with John F. Kennedy’s challenge to put a man on the moon. This isn’t a new idea. In fact, a few months after the first moon landing, there was a full-page ad in the Washington Post that began “Mr. Nixon: You can cure cancer.” The thinking was the familiar refrain “If we can put a man on the moon, we can …” President Nixon and other politicians were excited about the idea and announced a “war on cancer.” Scientists, however, were more skeptical. Sol Spiegelman said at the time

An all-out effort at this time would be like trying to land a man on the moon without knowing Newton’s laws of gravity.

The new moon shots are not a national attempt to “cure cancer” in the abstract. They are six initiatives at one institution to focus research on specific kinds of cancer. And while we do not yet know the analog of Newton’s laws for cancer, we do know far more about the basic biology of cancer than we did in the 1970s.

There are results that suggest that there is some unity beyond the diversity of cancer, that ultimately there are a few common biological pathways involved in all cancers. Maybe some day we will be able to treat cancer in general, but for now it looks like the road forward is specialization. Perhaps specialized research programs will uncover some of these common patters in all cancer.

How long will there be computer science departments?

Posted on 20 September 2012 by John

When universities began to form computer science departments, there was some discussion over how long computer science departments would exist. Some thought that after a few years, computer science departments would have served their purpose and computer science would be absorbed into other departments that applied it.

It looks like computer science departments are here to stay, but that doesn’t mean that there are not territorial disputes. If other departments are not satisfied with the education their students are getting from the computer science department, they will start teaching their own computer science classes. This is happening now, to different extents in different places.

Some institutions have departments of bioinformatics or biomathematics. Will they always? Or will “bioinformatics” and “biomathematics” simply be “biology” in a few years?

Statisticians sometimes have their own departments, sometimes reside in mathematics departments, and sometimes are scattered to the four winds with de facto statisticians working in departments of education, political science, etc. It would be interesting to see which of these three options grows in the wake of “big data.” A fourth possibility is the formation of “data science” departments, essentially statistics departments with more respect for machine learning and with better marketing.

No doubt computer science, bioinformatics, and statistics will be hot areas for years to come, but the scope of academic departments by these names will change. At different institutions they may grow, shrink, or even disappear.

Academic departments argue that because their subject is important, their department is important. And any cut to their departmental budget is framed as a cut to the budget for their subject. But neither of these is necessarily true. Matt Briggs wrote about this yesterday in regard to philosophy. He argues that philosophy is important but that philosophy departments are not. He quotes Peter Kreeft:

Philosophy was not a “department” to its founders. They would have regarded the expression “philosophy department” as absurd as “love department.”

Love is important, but it doesn’t need to be a department. In fact, it’s so important that the idea of quarantining it to a department is absurd.

Computer science and statistics departments may shrink as their subjects diffuse throughout the academy. Their departments may not go away, but they may become more theoretical and more specialized. Already most statistics education takes place outside of statistics departments, and the same may be true of computer science soon if it isn’t already.

Month: September 2012