Doing good work with bad tools

Charlie Parker was one of the greatest jazz musicians. But unlike most artists, he had a cavalier attitude toward his equipment. He would pawn his saxophone for drug money and show up for a concert without an instrument. He assumed that he could always borrow a saxophone at the last minute. He even used a plastic saxophone for one concert. Parker could take a cheap piece of plastic and make it sound good.

Good equipment helps. I’ve played cheap saxophones and professional quality saxophones, and I much prefer the latter. But a good sax didn’t make me sound like Charlie Parker, nor did a cheap sax make Charlie Parker sound like me. A poor craftsman blames his tools.

For centuries people have searched for the secret of Stradivarius violins. What did Antonio Stradivari do to create his legendary instruments? Was there something special about the wood he used? Something special about the varnish? A new theory says that there was nothing unusual about the materials he used and that he simply did excellent work.

It’s hard to think of a worse programming environment than DOS batch files. But I worked with someone who was able to do amazing things with batch files.

Hugh MacLeod calls it “hiding behind pillars” when you think you must have the best tools before you can work. He summarizes hiding behind pillars this way:

The more talen­ted some­body is, the less they need the props. Mee­ting a per­son who wrote a mas­ter­piece on the back of a deli menu would not sur­prise me. Mee­ting a per­son who wrote a mas­ter­piece with a sil­ver Car­tier foun­tain pen on an anti­que wri­ting table in an airy SoHo loft would SERIOUSLY sur­prise me.

Related posts:

Thomas Edison’s fire
Too much time on their hands?
Redbelt problem solving

Read More

Did the MS Office ribbon work?

One of the major design goals for Microsoft Office 2007 was making features easier to discover. A study had shown that about 90% of the feature requests for Microsoft Office were for features already in the product. People just didn’t know what was already there.

A major part of Microsoft’s response was the “ribbon” interface. More controls are on display rather than being hidden behind a deep hierarchy of menus. According to Katherine Murray, the user interface changes achieved their goal.

Data is showing that the redesign of Office really did reach this goal — Word 2007 and Excel 2007 users are now using four times as many features as they used in previous versions, and for PowerPoint, the increase in feature use is a factor of five.

The quote above was taken from First Look: Microsoft Office 2010. I’d like to see more details, but the book is a sales brochure and not a statistical report. Still, if you take these figures at face value, it seems the ribbon and other user interface changes were very successful.

Many pundits hate the ribbon. But most of the 500 million people who use Microsoft Office are not pundits.

Read More

Managing biological data

Jon Udell’s latest Interviews with Innovators podcast features Randall Julian of Indigo BioSystems. I found this episode particularly interesting because it deals with issues I have some experience with.

The problems in managing biological data begin with how to store the raw experiment data. As Julian says

… without buying into all the hype around semantic web and so on, you would argue that a flexible schema makes more sense in a knowledge gathering or knowledge generation context than a fixed schema does.

So you need something less rigid than a relational database and something with more structure than a set of Excel spreadsheets. That’s not easy, and I don’t know whether anyone has come up with an optimal solution yet. Julian said that he has seen many attempts to put vast amounts of biological data into a rigid relational database schema but hasn’t seen this approach succeed yet. My experience has been similar.

Representing raw experimental data isn’t enough. In fact, that’s the easy part. As Jon Udell comments during the interview

It’s easy to represent data. It’s hard to represent the experiment.

That is, the data must come with ample context to make sense of the data. Julian comments that without this context, the data may as well be a list of zip codes. And not only must you capture experimental context, you must describe the analysis done to the data. (See, for example, this post about researchers making up their own rules of probability.)

Julian comments on how electronic data management is not nearly as common as someone unfamiliar with medical informatics might expect.

So right now maybe 50% of the clinical trials in the world are done using electronic data capture technology. … that’s the thing that maybe people don’t understand about health care and the life sciences in general is that there is still a huge amount of paper out there.

Part of the reason for so much paper goes back to the belief that one must choose between highly normalized relational data stores and unstructured files. Given a choice between inflexible bureaucracy and chaos, many people choose chaos. It may work about as well, and it’s much cheaper to implement. I’ve seen both extremes. I’ve also been part of a project using a flexible but structured approach that worked quite well.

Related posts:

Posts on reproducibility
Problems versus dilemmas
Blogging about reproducible research

Read More

Less isn't more. Just enough is more.

From Ten Things I Have Learned by Milton Glaser:

Being a child of modernism I have heard this mantra all my life. Less is more. One morning upon awakening I realised that it was total nonsense … If you look at a Persian rug, you cannot say that less is more because you realise that every part of that rug, every change of colour, every shift in form is absolutely essential for its aesthetic success. You cannot prove to me that a solid blue rug is in any way superior. … However, I have an alternative to the proposition that I believe is more appropriate. ‘Just enough is more.’

Related posts:

Simple legacy
Simplicity in old age
The simplest thing that might work

Read More

If you have a great idea, don't tell it to a standards body

Another quote from Douglas Crockford’s talk The State and Future of JavaScript:

If you have a great idea, don’t tell it to a standards body. They are the last people in the world who should hear about it. What you should do instead is implement it and show it to the world, and if the world likes it then the world will say yeah, that should be a standard. I’ve seen too many cases where people try to do this in the reverse order, and you don’t want to do it that way. Prove it first, prove the need, and then we should put it in the standard.

Related post:

The virtual machine of the Internet

Read More

The virtual machine of the Internet

From Douglas Crockford’s talk The State and Future of JavaScript:

There’s pressure to make it [i.e. JavaScript] a better compilation target. Now, this is a big surprise. Everybody thought that the Java VM was going to be the VM of the internet, but it turns out that JavaScript language is the VM [ virtual machine ] of the internet. People are writing in Java, and Python, and lots of other languages, and then translating it into JavaScript because JavaScript, for all of its security problems, actually has a much better security model than everybody else.

Related posts:

Zero-knowledge password management in JavaScript
JavaScript: A picture is worth a thousand words
Programming language subsets

Read More

Twelve Days of Christmas and tetrahedral numbers

How many gifts are there in the song Twelve Days of Christmas?

Day 1: 1 gift
Day 2: 1 + 2 = 3 gifts
Day 3: 1 + 2 + 3 = 6 gifts

Day 12: 1 + 2 + 3 + … + 12 = 78 gifts

The number of gifts on day n is the nth triangular number. The total number of gifts up to and including day n is the sum of the first n triangular numbers, known as the nth tetrahedral number. In the image below, the total number of balls is the fifth tetrahedral number. The number of balls in each layer are triangular numbers. (Image credit: Math is Fun.)

tetrahedron of glass balls

I’ll develop a formula for tetrahedral numbers and continuations of the pattern  such as the sum of tetrahedral numbers etc.

First, let T(n, 1) = n.

Next, let T(n, 2) be the nth triangular number. So T(n, 2) is the sum of the first n terms in the sequence T(i, 1).

Next, let T(n, 3) be the nth tetrahedral number. So T(n, 3) is the sum of the first n terms in the sequence T(i, 2).

In general, define T(n, k) to be the sum of the first n terms in the sequence T(i, k-1). You could think of T(n, k) as the nth k-dimensional triangular number. (A tetrahedron is a sort of 3-dimensional triangle. It’s a pyramid whose base is a triangle. T(n,4) would count balls arranged in a sort of 4-dimensional triangle, a simplex in 4 dimensions.)

Theorem: T(n, k) = n(n+1)(n+2) … (n+k-1)/k!

Corollary: There are T(12, 3) = 12*13*14/6 = 364 gifts in the Twelve Days of Christmas.

See these notes for a elementary proof by induction.

Update: Here’s more advanced proof that uses calculus of finite differences.  The more advanced proof requires more background, but it also gives a better idea of how someone might have discovered the formula.

Related posts:

Binomial coefficients
Splitting a convex set through its center
My favorite Christmas carol

Read More

Word frequencies in human and computer languages

This is one of my favorite quotes from Starbucks’ coffee cups:

When I was young I was mislead by flash cards into believing that xylophones and zebras were much more common.

Alphabet books treat every letter as equally important even though letters like X and Z are far less common than letters like E and T. Children need to learn the entire alphabet eventually, and there are only 26 letters, so teaching all the letters at once is not bad. But uniform emphasis doesn’t scale well. Learning a foreign language, or a computer language, by learning words without regard to frequency is absurd. The most common words are far more common than the less common words, and so it makes sense to learn the most common words first.

John Miles White has applied this idea to learning R. He did a keyword frequency analysis for R and showed that the frequency of the keywords follows Zipf’s law or something similar. I’d like to see someone do a similar study for other programming languages.

It would be interesting to write a programming language tutorial that introduces the keywords in the approximately the order of their frequency. Such a book might be quite unorthodox, and quite useful.

White points out that when teaching human languages in a classroom, “the usefulness of a word tends to be confounded with its respectability.” I imagine something similar happens with programming languages. Programs that produce lists of Fibonacci numbers or prime numbers are the xylophones and zebras of the software world.

Related posts:

Zebras and xylophones part II: learning Spanish
Rate of regularizing English verbs
Four reasons we don’t apply the 80-20 rule
R, the good parts

Read More

Breast cancer stem cells identified

From the article Proverbial new “Twist” in Breast Cancer Detection:

… scientists at Johns Hopkins … have shown that a protein made by a gene called “Twist” may be the proverbial red flag that can accurately distinguish stem cells that drive aggressive, metastatic breast cancer from other breast cancer cells.

Related posts:

Detecting breast cancer from a hair sample
Visualizing cancer DNA scrambling
Killing too much of a tumor

Read More

Weekend miscellany

Learning faster than many think possible

There is no speed limit
Beating the system

Advice for developing software and for life

To go fast, do less

Harvard study critical of the value of hospital IT projects

Not worth the money

Free ebook

PowerShell TFM version 1

Disturbing article about the people who built Dubai

The dark side of Dubai

How to tell whether someone is smart and gets things done

How to hire programmers

The latest Carnival of Mathematics

#60

Pop Up from abw on Vimeo.

Read More

Creativity and faith

From Eugene Peterson:

Creativity is difficult. When you are being creative, you’re living by faith. You don’t know what’s next because the created, by definition, is what’s never been before. So you’re living at the edge of something in which you’re not very confident. You might fail: in fact, you almost certainly will fail a good part of the time. All the creative persons I know throw away most of the stuff they do.

Related posts:

Don’t try to be God, try to be Shakespeare
Subtle variations on familiar themes
Three quotes on originality

Read More