Trivial

Many of the things I once thought were trivial I now think are important. That is, I used to think they were trivial in the modern sense of being unimportant. Now I think they’re trivial in the classical sense of being foundational (from trivium, the first stage of a classical education).

In business, “trivial” means “vitally important if you’re actually doing the work, but not important if you’re just watching.”

Related link: Very applied math

80-20 software II

My previous post addressed an objection to apply the 80-20 rule to software. Namely, that even if every user uses only a small portion of the features, they use different portions, and so you can’t remove any of it.

In this post I’ll address a couple more objections. One objection is that if the 80-20 principle holds, you can apply it over and over until you’re left with nothing. That is, if 80% of your users are content with 20% of your features, then 80% of those users (64% of the original user base) will be content with only 20% of the most-used features (4% of the original features). Keep repeating until you’re down to one feature everybody loves.

First of all, it may indeed be true that you could apply the 80-20 rule more than once. Maybe 64% of users really are content with 4% of the features. Just because the rule doesn’t apply an infinite number of times doesn’t mean that you can’t apply it once or twice before it breaks down.

More fundamentally, there’s nothing magical about “80” and “20.” The more general and more realistic principle is simply that often return on effort is very unevenly distributed. Maybe 92% of customers use only 17% of features. Maybe 70% of customers use only 3% of features. The numbers vary.

If you do apply the rule repeatedly, you may get a different distribution each time. Suppose you first limit your attention to the most popular features, whatever percentage cut-off that turns out to be. Then within those functions, some will still be more popular than others. The proportions might not be the same as your first cut, but popularity will still be uneven until you get down to a small core of features that most people use.

But as I explained in the previous post, the time to talk about cutting features is before they are developed and deployed. Once a feature has shipped, it is extremely hard to remove.

Another objection is that it is impossible to predict what features users will want. You can’t know until you ship it. Certainly it’s easier to tell in hindsight what people want, but it’s going too far to say you cannot predict anything. If you really could not predict anything, then all software would be bloated. A great deal of software is bloated, but not all of it. Small, successful programs do exist.

Even if you could literally apply the 80-20 rule several times, big companies might not be content with the resulting market share. Suppose you could apply the 80-20 rule to Microsoft Word four times. Then 41% of customers would be content with 0.16% of the features. (I don’t think this is realistic, but let’s assume it is for the sake of argument.) Microsoft might not be happy with writing off 59% of their potential market up front. But a start-up might be thrilled to give up half their potential market in exchange for only having to develop a tiny fraction of the features.

80-20 software

The 80-20 rule says that often 80% of your results come from 20% of your effort. Applied to software, 80% of your customers may only use 20% of the features. So why not just develop that 20% and let the rest go?

There are numerous objections to this line of reasoning. I’m just going to address one here. Maybe each of your customers uses a small subset of your features, say nobody uses more than 5%. But they all use different subsets of the features. When you add together everybody’s 5% you end up with everything being used. For example, Microsoft Word is huge. I doubt many people use more than 1% of the software. And yet every feature is being used somewhere.

That’s a valid point, but it’s a stronger point after the software is written than before it is written. Once a feature is released, someone is going to use it. And once someone is accustomed to using it, they’re going to want to keep using it.

Suppose your software provides two redundant ways to accomplish some task, method 1 and method 2. Half your users stumble on method 1 first and get comfortable using it. The other latch on to method 2. Now you cannot remove either method without upsetting half your customers. But if you had only shipped method 1, everyone would have used it and been happy.

Removing features is almost impossible. You can never substantially simplify an existing product without the risk of making customers angry. But those same customers might have been happy with a simpler product if that’s what you’d delivered first.

A hidden cost of extra features is that they may need to be supported for years to come.

Update: See follow-up post.

Related posts

Binomial coefficient trick

Binomial coefficients are simplest to work with when the arguments are non-negative integers, but more general arguments are possible and useful. Concrete Mathematics argues that the most useful case is when the top index is real and the bottom index is an integer, and sticks to that assumption, though both arguments could be real or even complex. More on that here.

Later the book claims

(r - k) {r \choose k} = r {r-1 \choose k}

for all (real) r and gives a proof. Following the proof is an interesting discussion.

But wait a minute. We’ve claimed that the identity holds for all real r, yet the derivation we just gave holds only when r is a positive integer. … Have we been cheating?

No, they have not been cheating. Both sides of the equation are polynomials in r. If two polynomials of degree d agree at d+1 points, they must agree everywhere. But these polynomials agree at an infinite number of points, namely all integers, and so they must be equal.

This is a common trick when working with binomial coefficients. It lets us use combinatorial arguments to prove theorems that extend to cases where the binomial coefficients do not have a combinatorial interpretation. But it’s also more generally useful. Often an equation amounts to saying two polynomials are equal, though we may not think of the terms as polynomials. But if we recognize that they are polynomials, we need only prove equality at a finite number of points to establish equality everywhere.

A similar technique is common in complex variables. You often prove an identity assuming real variables, then get the complex version for free. For example, every trig identity you saw in high school remains valid when the arguments are complex numbers. Why? Because analytic functions are, roughly speaking, polynomials of infinite degree (i.e. they have a convergent power series). If two analytic functions agree on an infinite set of values with a limit point (such as the real line) then they agree everywhere.

More binomial coefficient posts

Deniers, skeptics, and mavericks

Suppose a scientist holds a minority opinion. There’s a trend in journalism to call him a denier if you think he’s wrong, a skeptic if you don’t care, and a maverick if you think he may be right. If this had been the norm in Einstein’s day, he might have been called a Newton-denier.

“Denier” is an ugly word. It implies that someone has no rational basis for his beliefs. He’s either an apologist for evil, as in a Holocaust denier, or mentally disturbed, as in someone in psychological denial. The term “denier” is inflammatory and has no place in scientific discussion.

Oldest series for pi

Here’s an interesting bit of history from Julian Havil’s new book The Irrationals. In 1593 François Viète discovered the following infinite product for pi:

\frac{2}{\pi} = \frac{\sqrt{2}}{2}\frac{\sqrt{2+\sqrt{2}}}{2}\frac{\sqrt{2 + \sqrt{2+\sqrt{2}}}}{2} \cdots

Havil says this is “the earliest known.” I don’t know whether this is specifically the oldest product representation for pi, or more generally the oldest formula for an infinite sequence of approximations that converge to pi. Vièta’s series is based on the double angle formula for cosine.

The first series for pi I remember seeing comes from evaluating the Taylor series for arc tangent at 1:

\frac{\pi}{4} = 1 - \frac{1}{3} + \frac{1}{5} - \frac{1}{7} + \cdots

I saw this long before I knew what a Taylor series was. I imagine others have had the same experience because the series is fairly common in popular math books. However, this series is completely impractical for computing pi because it converges at a glacial pace. Vièta’s formula, on the other hand, converges fairly quickly. You could see for yourself by running the following Python code:

    from math import sqrt

    prod = 1.0
    radic = 0.0

    for i in range(10):
        radic = sqrt(2.0 + radic)
        prod *= 0.5*radic
        print 2.0/prod

After 10 terms, Vièta’s formula is correct to five decimal places.

Posts on more sophisticated and efficient series for computing pi:

53 bits ought to be enough for anybody

When I first heard of software that could do extended precision calculation, I thought it would be very useful. Years later, I haven’t had much use for it. When I’ve though I needed extended precision, I’ve usually found a better way to solve my problem using ordinary precision and a little pencil-and-paper math. Avoiding extended precision calculation has caused me to understand my problems better. (Here’s a recent example.)

I’m not saying that extended precision isn’t sometimes necessary, or that I would go to great lengths to avoid using it. I’m only saying that I’ve had little use for it, much less than I expected, and that I’ve learned a few things by not immediately resorting to brute force.

(Why 53 bits? That’s the precision of an IEEE 754 standard floating point number, regrettably called a “double.” It’s no longer “double,” it’s typical.)

Software cannot offer infinite precision, so you have to carry out your calculations to some finite precision. If the roughly 15 decimal places of standard precision is not enough, how much do you need? How about 50? Or 100? How do you know what to choose? If you hope more precision will eliminate the need to understand what’s going on numerically, good luck with that. Maybe it will. Or maybe you’ll still see the same kinds of problems you had with standard precision.

A good use of extended precision might be as follows. You’re trying to compute the difference between two numbers that agree to 30 decimal places, so 40 decimal places of precision in your calculation will give you 10 decimal places in your result. But suppose you think “This isn’t what I expect. I’ll try a little more precision.” The computer may be trying to tell you that you’re going about something the wrong way, and the extra precision could mask your problem and give you confidence in a wrong answer.

More floating point computation posts

Keyboard hack

Why do all the keys on a standard keyboard feel the same? The only tactile clues are little bumps on the f and j keys to help you find the home row. Some keyboards use different colors for different keys, but such visual clues train you to look at the keyboard. If you want to learn touch typing, you need tactile clues.

I experimented with this while changing how I use the control keys. By putting a little felt* on top of the control keys, I could feel when I’d reached for the correct key. This particularly helps when switching from my desktop to laptop since the left control key is in a different position on each keyboard.

* I didn’t actually use felt but rather the soft half of a velcro fastener because that’s what I found first.

Geometry of the Sydney Opera House

Alexander Hahn’s new book Mathematical Excursions to the World’s Great Buildings explores a wide range of mathematics and architecture. Here I’ll quote a little of the book’s discussion on how Danish architect Jørn Utzon came up with the geometry of the Sydney Opera House.

Parabolas (or more accurately paraboloids, given the three-dimensional aspect of the vaults) were Utzon’s first choice for the profiles of the vaults. … At a later point ellipses (or again more accurately, ellipsoids) were considered. For reasons that we will explain shortly, neither of these geometries provided a buildable option.

The restriction on the shape came from the ribs supporting the shells.

The large size of the shells meant that they would have to be constructed in sections or components. The demands of economy and time meant that these components would have to be pas produced. A parabolic or elliptical shell would not do because then each rib would curve differently.

The solution was incredibly simple.

Utzon’s flash was the realization that a limitless variety of curving triangles could be drawn on a sphere. So all the shells for the roofs could be designed as curving triangles from the same sphere! This was the idea that saved the project … exactly five years after the official announcement that he had won the competition.

I was surprised to read that the shells are simply made of spherical triangles, all of the same radius. I expected something more complicated.

Related post: Spherical trigonometry

Reading the masters

This weekend I ran across a blog post by Federico Pereiro entitled Read the masters. The post opens with a quote from Niels Henrik Abel:

When asked how he developed his mathematical abilities so rapidly, he replied “by studying the masters, not their pupils.”

Someone asked me via Twitter what I thought of this, and my reply took more than 140 characters, so here it is.

I don’t know the context of Abel’s comment. Abel may not have intended it to come across as stark as it sounds of out context and in possibly in translation.

Sometimes the pupils are better expositors than the masters, so the pupils may be worth studying, perhaps as a warm-up to reading the masters. Rather than saying “don’t read their pupils,” I’d put the emphasis on “do read the masters.” At least give them a try. Sometimes the masters are surprisingly easy to read. As C. S. Lewis said of philosophers,

The student is half afraid to meet one of the great philosophers face to face. … But if he only knew, the great man, just because of his greatness, is much more intelligible than his modern commentators.

Read more on Lewis’ quote here.

The second part of Federico Pereiro’s blog post was the advice to plow through an original source rather than seeking something easier. That’s generally good advice, though I wouldn’t be too rigid about it. As W. C. Fields said

If at first you don’t succeed, try, try again. Then quit. There’s no point in being a damn fool about it.