Doing good work with bad tools

Charlie Parker was one of the greatest jazz musicians. But unlike most artists, he had a cavalier attitude toward his equipment. He would pawn his saxophone for drug money and show up for a concert without an instrument. He assumed that he could always borrow a saxophone at the last minute. He even used a plastic saxophone for one concert. Parker could take a cheap piece of plastic and make it sound good.

Good equipment helps. I’ve played cheap saxophones and professional quality saxophones, and I much prefer the latter. But a good sax didn’t make me sound like Charlie Parker, nor did a cheap sax make Charlie Parker sound like me. A poor craftsman blames his tools.

For centuries people have searched for the secret of Stradivarius violins. What did Antonio Stradivari do to create his legendary instruments? Was there something special about the wood he used? Something special about the varnish? A new theory says that there was nothing unusual about the materials he used and that he simply did excellent work.

It’s hard to think of a worse programming environment than DOS batch files. But I worked with someone who was able to do amazing things with batch files.

Hugh MacLeod calls it “hiding behind pillars” when you think you must have the best tools before you can work. He summarizes hiding behind pillars this way:

The more talen­ted some­body is, the less they need the props. Mee­ting a per­son who wrote a mas­ter­piece on the back of a deli menu would not sur­prise me. Mee­ting a per­son who wrote a mas­ter­piece with a sil­ver Car­tier foun­tain pen on an anti­que wri­ting table in an airy SoHo loft would SERIOUSLY sur­prise me.

Related posts:

Did the MS Office ribbon work?

One of the major design goals for Microsoft Office 2007 was making features easier to discover. A study had shown that about 90% of the feature requests for Microsoft Office were for features already in the product. People just didn’t know what was already there.

A major part of Microsoft’s response was the “ribbon” interface. More controls are on display rather than being hidden behind a deep hierarchy of menus. According to Katherine Murray, the user interface changes achieved their goal.

Data is showing that the redesign of Office really did reach this goal — Word 2007 and Excel 2007 users are now using four times as many features as they used in previous versions, and for PowerPoint, the increase in feature use is a factor of five.

The quote above was taken from First Look: Microsoft Office 2010. I’d like to see more details, but the book is a sales brochure and not a statistical report. Still, if you take these figures at face value, it seems the ribbon and other user interface changes were very successful.

Many pundits hate the ribbon. But most of the 500 million people who use Microsoft Office are not pundits.

Managing biological data

Jon Udell’s latest Interviews with Innovators podcast features Randall Julian of Indigo BioSystems. I found this episode particularly interesting because it deals with issues I have some experience with.

The problems in managing biological data begin with how to store the raw experiment data. As Julian says

… without buying into all the hype around semantic web and so on, you would argue that a flexible schema makes more sense in a knowledge gathering or knowledge generation context than a fixed schema does.

So you need something less rigid than a relational database and something with more structure than a set of Excel spreadsheets. That’s not easy, and I don’t know whether anyone has come up with an optimal solution yet. Julian said that he has seen many attempts to put vast amounts of biological data into a rigid relational database schema but hasn’t seen this approach succeed yet. My experience has been similar.

Representing raw experimental data isn’t enough. In fact, that’s the easy part. As Jon Udell comments during the interview

It’s easy to represent data. It’s hard to represent the experiment.

That is, the data must come with ample context to make sense of the data. Julian comments that without this context, the data may as well be a list of zip codes. And not only must you capture experimental context, you must describe the analysis done to the data. (See, for example, this post about researchers making up their own rules of probability.)

Julian comments on how electronic data management is not nearly as common as someone unfamiliar with medical informatics might expect.

So right now maybe 50% of the clinical trials in the world are done using electronic data capture technology. … that’s the thing that maybe people don’t understand about health care and the life sciences in general is that there is still a huge amount of paper out there.

Part of the reason for so much paper goes back to the belief that one must choose between highly normalized relational data stores and unstructured files. Given a choice between inflexible bureaucracy and chaos, many people choose chaos. It may work about as well, and it’s much cheaper to implement. I’ve seen both extremes. I’ve also been part of a project using a flexible but structured approach that worked quite well.

Related posts:

Less isn’t more. Just enough is more.

From Ten Things I Have Learned by Milton Glaser:

Being a child of modernism I have heard this mantra all my life. Less is more. One morning upon awakening I realised that it was total nonsense … If you look at a Persian rug, you cannot say that less is more because you realise that every part of that rug, every change of colour, every shift in form is absolutely essential for its aesthetic success. You cannot prove to me that a solid blue rug is in any way superior. … However, I have an alternative to the proposition that I believe is more appropriate. ‘Just enough is more.’

Related posts:

Using the Windows file explorer without a mouse

The Windows File Explorer has a number of keyboard shortcuts that do not apply to Windows programs in general.

First of all, you can type Windows key-E to open the File Explorer. You can close it by typing Alt-F4.

Alt-D highlights address box. (Alt-D also highlights the address box of web browsers: IE, Firefox, Safari, etc.) F4 opens a drop-down list of folders in the address bar.

There are several numeric keypad shortcuts for expanding and collapsing folders.

  • * expands everything under the current selection
  • + expands the current selection
  • - collapses the current selection.

Note that the above keys must be on the numeric keypad; the - on the top of the main part of the keyboard, for example, has no effect on the File Explorer.

You can use the up and down arrow keys to move between files and folders.

The right arrow key expands the current selection. If the current selection is already expanded, the key takes you to the first child.

The left arrow key collapses the current selection. If the current selection is already collapsed, the key takes you to the parent folder.

F2 lets you rename an object. (For the rest of this post, “object” means “file or folder.”)

Shift-F10 opens the context menu of an object, as if you had right-clicked on the object. (There’s also a special key for this; the key has a picture of a mouse selecting something from a list.) Once you bring up the context menu, you can use the up and down arrow keys to  highlight a menu item and the enter key to click it.

Alt-Enter opens the Properties dialog for an object, as if you had right-clicked and selected its Properties from the context menu.

F6 lets you cycle between the panes of the File Explorer.

***

For daily tips on using Windows without a mouse, you can follow @SansMouse on Twitter or subscribe to its RSS feed. This post will be split into bite size pieces and added to SansMouse a few weeks from now.

Related posts:

If you have a great idea, don’t tell it to a standards body

Another quote from Douglas Crockford’s talk The State and Future of JavaScript:

If you have a great idea, don’t tell it to a standards body. They are the last people in the world who should hear about it. What you should do instead is implement it and show it to the world, and if the world likes it then the world will say yeah, that should be a standard. I’ve seen too many cases where people try to do this in the reverse order, and you don’t want to do it that way. Prove it first, prove the need, and then we should put it in the standard.

Related post: The virtual machine of the Internet

The virtual machine of the Internet

From Douglas Crockford’s talk The State and Future of JavaScript:

There’s pressure to make it [i.e. JavaScript] a better compilation target. Now, this is a big surprise. Everybody thought that the Java VM was going to be the VM of the Internet, but it turns out that JavaScript language is the VM [ virtual machine ] of the internet. People are writing in Java, and Python, and lots of other languages, and then translating it into JavaScript because JavaScript, for all of its security problems, actually has a much better security model than everybody else.

Related posts:

Twelve Days of Christmas and tetrahedral numbers

How many gifts are there in the song Twelve Days of Christmas?

Day 1: 1 gift
Day 2: 1 + 2 = 3 gifts
Day 3: 1 + 2 + 3 = 6 gifts

Day 12: 1 + 2 + 3 + … + 12 = 78 gifts

The number of gifts on day n is the nth triangular number. The total number of gifts up to and including day n is the sum of the first n triangular numbers, known as the nth tetrahedral number. In the image below, the total number of balls is the fifth tetrahedral number. The number of balls in each layer are triangular numbers. (Image credit: Math is Fun.)

tetrahedron of glass balls

I’ll develop a formula for tetrahedral numbers and continuations of the pattern  such as the sum of tetrahedral numbers etc.

First, let T(n, 1) = n.

Next, let T(n, 2) be the nth triangular number. So T(n, 2) is the sum of the first n terms in the sequence T(i, 1).

Next, let T(n, 3) be the nth tetrahedral number. So T(n, 3) is the sum of the first n terms in the sequence T(i, 2).

In general, define T(n, k) to be the sum of the first n terms in the sequence T(i, k-1). You could think of T(n, k) as the nth k-dimensional triangular number. (A tetrahedron is a sort of 3-dimensional triangle. It’s a pyramid whose base is a triangle. T(n,4) would count balls arranged in a sort of 4-dimensional triangle, a simplex in 4 dimensions.)

Theorem: T(n, k) = n(n+1)(n+2) … (n+k-1)/k!

Corollary: There are T(12, 3) = 12*13*14/6 = 364 gifts in the Twelve Days of Christmas.

See these notes for a elementary proof by induction.

Update: Here’s more advanced proof that uses calculus of finite differences.  The more advanced proof requires more background, but it also gives a better idea of how someone might have discovered the formula.

Related posts:

Word frequencies in human and computer languages

This is one of my favorite quotes from Starbucks’ coffee cups:

When I was young I was mislead by flash cards into believing that xylophones and zebras were much more common.

Alphabet books treat every letter as equally important even though letters like X and Z are far less common than letters like E and T. Children need to learn the entire alphabet eventually, and there are only 26 letters, so teaching all the letters at once is not bad. But uniform emphasis doesn’t scale well. Learning a foreign language, or a computer language, by learning words without regard to frequency is absurd. The most common words are far more common than the less common words, and so it makes sense to learn the most common words first.

John Miles White has applied this idea to learning R. He did a keyword frequency analysis for R and showed that the frequency of the keywords follows Zipf’s law or something similar. I’d like to see someone do a similar study for other programming languages.

It would be interesting to write a programming language tutorial that introduces the keywords in the approximately the order of their frequency. Such a book might be quite unorthodox, and quite useful.

White points out that when teaching human languages in a classroom, “the usefulness of a word tends to be confounded with its respectability.” I imagine something similar happens with programming languages. Programs that produce lists of Fibonacci numbers or prime numbers are the xylophones and zebras of the software world.

Related posts:

Breast cancer stem cells identified

From the article Proverbial new “Twist” in Breast Cancer Detection:

… scientists at Johns Hopkins … have shown that a protein made by a gene called “Twist” may be the proverbial red flag that can accurately distinguish stem cells that drive aggressive, metastatic breast cancer from other breast cancer cells.

Related posts:

Creativity and faith

From Eugene Peterson:

Creativity is difficult. When you are being creative, you’re living by faith. You don’t know what’s next because the created, by definition, is what’s never been before. So you’re living at the edge of something in which you’re not very confident. You might fail: in fact, you almost certainly will fail a good part of the time. All the creative persons I know throw away most of the stuff they do.

Related posts: