Pareto’s 80-20 rule

Vilfredo Pareto

Pareto’s 80-20 rule says that 80% of your results often come from 20% of your effort. Maybe 80% of your profit comes from 20% of your customers, or maybe 80% of the bugs in your software are removed in the first 20% of the time you spend debugging.

The rule is named after Italian economist Vilfredo Pareto who observed that 80% of his country’s land belonged to 20% of its population. The exact ratio of 80-20 isn’t important, though it is surprisingly common. The same principle applies whenever a large majority of effects come from a small number of causes.

The 80-20 rule, or Pareto principle, is startling the first time you hear it. It suggests you can be a lot more productive by focusing your effort where it does the most good. For example, there may be 100,000 to 1,000,000 words in English, depending on how you count them. But you could be pretty fluent in English by knowing the 1,000 most common words.

The thousand most frequently used words in any language are far more important than all the rest combined. Studying these words first makes much more sense than a uniformitarian approach, going through a dictionary in alphabetic order on the assumption that all words are equally important.

I’ve thought about the Pareto principle off and on for many years. When I bring it up for discussion, people are often defensive, bringing up the same objections every time.

Objections

The most common objection is the recursive argument. If you could be more effective by focusing on the 20% that’s most important, then you should do that again: focus on the 20% of the 20% that’s most important. Apply this argument repeatedly and you can be infinitely productive with no effort.

The recursive argument takes the “80” and “20” of the 80-20 rule too literally. The point is not the exact ratios. The point is that return on effort invested is not uniformly distributed. In fact, it’s often far from uniformly distributed. I prefer the term Pareto principle to “80-20 rule” just because it does not reference particular numbers that could distract from the general principle.

Could you apply a Pareto principle recursively to English words, say by focusing on the 200 most common words? In fact your could. But that doesn’t mean that you could keep doing this repeatedly, learning only the most common word (“the”) and declaring yourself fluent in English. This doesn’t negate the fact that the importance of English words is very unevenly distributed.

Another objection is the completionist argument. It says that everything has to be done, so the fact that you get less return on some things than others doesn’t matter. For example, the letters E, T, and A appear about 100 times as often as J, Q, and Z. That doesn’t mean you could leave J, Q, and Z off your keyboard. On the other hand, it does mean that you might design a keyboard so that E, T, and A are easier to reach than J, Q, and Z. And Samuel Morse was smart to assign his shortest codes to the most frequently used letters. [1]

A final objection is the ignorance argument: we simply don’t what the most effective 20% will be beforehand. This is a serious objection, and it should temper our optimism regarding the Pareto principle. If a salesman knew which 20% of his prospects were going to buy, he should just sell to them. But of course he doesn’t know ahead of time who those 20% will be. On the other hand, he has some idea who is likely to buy (and how much they may buy) and doesn’t approach prospects randomly.

These objections take the Pareto principle to extremes to justify disregarding it. Since you can’t repeatedly apply it indefinitely, there must be nothing to it. Or if you can’t completely eliminate the least productive work, you should treat everything equally. Or if you don’t have absolute certainty regarding what’s most important, you shouldn’t consider what’s likely to be most important.

Applications

Despite the objections above, it is true that returns on effort are often very unevenly distributed. There’s a common tendency to under estimate the variance [2]. We might have a rough idea how effective a list of possible actions would be, and maybe imagine than the most effective choice would be ten times better than the least effective choice, but in fact the ratio might be a hundred to one or even a thousand to one [3]. Somehow we mentally compress these ratios, maybe on something like a logarithmic scale.

So one key to taking advantage of the Pareto principle is simply to keep in mind that something like the Pareto principle might hold. You’re not likely to find a Pareto rule if you don’t think they exist.

Another key is to be honest with ourselves regarding how effective we want to be. Maybe the most effective thing to do is something we simply don’t want to do. If so, we can either make a principled decision to not do what we know to be more effecitve, or get over our sloth.

I mentioned ignorance above. “Uncertainty” is a more helpful word than “ignorance” here because we’re not often completely ignorant. We usually have some idea which actions are more likely to be effective. Data can help. Start by using whatever information or intuition you have, and update it as you gather data.

This could be a formal Bayesian process if you have quantifiable data. Or it could be as simple as just trying something. If it works, try it again. If not, try something different. You may be able to bootstrap this “play the winner” strategy until you have enough data to be more formal about making decisions.

***

[1] How well does Morse code symbol length correspond to frequency? I looked into that here.

[2] I have a friend who has helped me with this. He will suggest I do X, and I agree, but say I’d rather do Y. Then he will reply with something like “Sure, you could do that. But X could be a thousand times more effective. It’s up to you.” I’ve done the same for others. It’s easier to see someone else’s decisions objectively than your own.

[3] This is not an exaggeration. I’ve seen this, for example, in software optimization. Some changes might make 1,000x more of a difference than others.

Objectives and constraints

Objectives and constraints are symmetrical in a mathematical sense but are asymmetrical in a psychological sense. By taking dual formulations, you can reverse the mathematical role of objectives and constraints, but in application objectives are more obvious than constraints.

In the question “What is the minimum value of x² over the interval [1, 5]?” the function f(x) = x² is the objective function and 1 ≤ x ≤ 5 is the constraint. If someone says the minimum is 0, they’ve minimized the objective function but ignored the constraint. This is clear in a such a simple problem, but failure to consider constraints can be much more subtle.

Objectives tend to be easily quantifiable—maximize profit, minimize energy consumption, etc.— but constraints tend to be less quantifiable—the solution has to be testable and maintainable, has to be legal, has to be something people will buy or vote for, etc.

When children ask “Why don’t you just …” it’s because they see a way to improve some objective, but the “just” part shows that they are either completely unaware of a relevant constraint or are unaware of how difficult it would be to overcome the constraint. As you mature, you become aware of more constraints. You realize that things that seem grossly subopitmal are actually close to optimal when you consider the necessary constraints. There may be room for improvement, but not as much as you imagined and at a higher cost.

Big opportunities open up when constraints change. Maybe an idea was abandoned because it would require more calculation than anyone could carry out by hand, and now’s the time to revisit it. Or maybe an idea was never developed because it would require instantaneous communication between people at multiple points on the globe. No problem now.

In both the examples above, a constraint was relaxed: computation and communication have gotten far less expensive. Increased constraints create opportunities as well. When the price of something goes up, its alternatives become more economical by comparison. Whether an oil field is worth developing, for example, depends on the current price of oil.

If I ask “Why hasn’t someone done this before?” I’m skeptical if the answer is “Because I’m smarter than everyone else who has tried.” But if the answer is “Because constraints have changed” then I’m much more receptive.

Related post: Boundary conditions are the hard part

Dividing projects into math, statistics, and computing

If you’ve read this blog for long, you know that my work is a combination of math, statistics, and computing.

I was looking over my records and tried to see how my work divides into these three areas. In short, it doesn’t.

The boundaries between these areas are fuzzy or arbitrary to begin with, but a few projects fell cleanly into one of the three categories. However, 85% of my income has come from projects that involve a combination of two areas or all three areas.

If you calculate a confidence interval using R, you could say you’re doing math, statistics, and computing. But for the accounting above I’d simply call that statistics. When I say a project uses math and computation, for example, I mean it requires math outside what is typical in programming, and programming outside what is typical in math.

Example of the bike shed principle

Celebration, Florida town seal

One of the case studies in Michael Beirut’s book How to is the graphic design for the planned community Celebration, Florida. The logo for the town’s golf course is an illustration of the bike shed principle.

C. Northcote Parkinson observed that it is easier for a committee to approve a nuclear power plant than a bicycle shed. Nuclear power plants are complex, and no one on a committee presumes to understand every detail. Committee members must rely on the judgment of others. But everyone understands bicycle sheds. Also, questions such as what color to paint the bike shed don’t have objective answers. And so bike sheds provoke long discussions.

People argue about bike sheds because they understand bike sheds. Beirut said something similar about the Celebration Golf Club logo which features a silhouette of a golfer.

Designing the graphics for Celebration’s public golf club was much harder than designing the town seal. It took me some time to realize why: none of our clients were Schwinn-riding, polytailed girls [as in the town seal], but most of them were enthusiastic golfers. The silhouette on the golf club design was refined endlessly as various executives demonstrated their swings in client meetings.

Image credit: By Source, Fair use, https://en.wikipedia.org/w/index.php?curid=37643922

Natural growth

Interesting passage from Small is Beautiful: Economics as if People Mattered by E. F. Schumacher:

Nature always, so to speak, knows where and when to stop. There is a measure in all natural things—in their size, speed, or violence. As a result, the system of nature, of which man is a part, tends to be self-balancing, self-adjusting, self-cleansing. Not so with technology, or perhaps I should say: not so with man dominated by technology and specialization. Technology recognizes no self-limiting principle …

We speak of natural growth more often than natural limits to growth. Maybe we should consider the latter more often.

Schumacher’s book was written in 1973 and seems to embody some of the hippie romanticism of its day. That does not make its arguments right or wrong, but it shows what some of the author’s influences were.

The book’s back cover has an endorsement describing Schumacher as “eminently practical, sensible, … versant in the subtleties of large-scale business management …” I haven’t read the whole book, only parts here and there, but the romantic overtones stand out more to me, maybe because they contrast more with the contemporary atmosphere. When the book was published, maybe the pragmatic overtones stood out more.

Optimal team size

Kevlin Henney’s keynote at GOTO Copenhagen this year discussed how project time varies as a function of the number of people on the project. The most naive assumption is that the time is inversely proportional to the number of people. That is

t = W/n

where t is the calendar time to completion, W is a measure of how much work is to be done, and n is the number of people. This assumes everything on the project can be done in parallel. Nobody waits for anybody else.

The next refinement is to take into account the proportion of work that can be done in parallel. Call this p. Then we have

t = W[1 – p(n-1)/n].

If everything can be done in parallel, p = 1 and tW/n as before. But if nothing can be done in parallel, p= 0, and so tW. In other words, the total time is the same whether one person is on the project or more. This is essentially Amdahl’s law.

With the equation above, adding people never slows things down. And if p > 0, every addition person helps at least a little bit.

Next we add a term to account for communication cost. Assume communication costs are proportional to the number of communication paths, n(n – 1)/2. Call the proportionality constant k. Now we have

t = W[1 – p(n-1)/n + kn(n-1)/2].

If k is small but positive, then at first adding more people causes a project to complete sooner. But beyond some optimal team size, adding more people causes the project to take longer.

Of course none of this is exact. Project time estimation doesn’t follow any simple formula. Think of these equations more as rough guides or metaphors. It’s certainly true that beyond a certain size, adding more people to a project can slow the project down. Kevlin gave examples of projects that were put back on track by reducing the number of people working on them.

My quibble with the equation above is that I don’t think the cost of more people is primarily communication. Communication paths in a real project are not the simple trees of org charts, but neither are they complete graphs. And if the problem were simply communication, then improved communication would mitigate the cost of adding people to a project, though I imagine it hardly does.

I think the cost of adding people to a project has more to do with Parkinson’s Law which says that people make work for each other. (The aphorism form of Parkinson’s Law says that work expands to the time allowed. But the eponymous book explains why work expands, and it is in part because people make extra work for each other.)

Dust jacket of the book Parkinsons Law and Other Studies in Administration

I wrote about a similar theme in the blog post Maybe you only need it because you have it. Here’s the conclusion of that post:

Suppose a useless project adds staff. These staff need to be managed, so they hire a manager. Then they hire people for IT, accounting, marketing, etc. Eventually they have their own building. This building needs security, maintenance, and housekeeping. No one questions the need for the security guard, but the guard would not have been necessary without the original useless project.

When something seems absolutely necessary, maybe it’s only necessary because of something else that isn’t necessary.

Grateful for failures

old saxophone

I’ve been thinking lately about different things I’ve tried that didn’t work out and how grateful I am that they did not.

The first one that comes to mind is my academic career. If I’d been more successful with grants and publications as a postdoc, it would have been harder to decide to leave academia. I’m glad I left when I did.

When I was in high school I was a fairly good musician. At one point decided that if I made the all-state band I would major in music. Thank God I didn’t make it.

I’ve looked back at projects that I hoped to get, and then realized how it’s a good thing that they didn’t come through.

In each of these examples, I’ve been forced to turn away from something I was moderately good at to pursue something that’s a better fit for me.

I wonder what failure I’ll be grateful for next.

 

Selecting clients

One of the themes in David Ogilvy’s memoir Confessions of an Advertising Man is the importance of selecting good clients. For example, he advises “never take associations as clients” because they have “too many masters, too many objectives, too little money.”

He also recommends not taking on clients that are so large that you would lose your independence and financial robustness by taking them on.

I have never wanted to get an account so big that I could not afford to lose it. The day you do that, you commit yourself to living with fear. Frightened agencies lose the courage to give candid advice; once you lose that you become a lackey.

This is what lead me to refuse an invitation to compete for the Edsel account. I wrote to Ford: “Your account would represent one-half of our total billing. This would make it difficult for us to sustain our independence of counsel.” If we had entered the Edsel contest, and if we had won it, Ogilvy, Benson & Bather would have gone down the drain with Edsel.

This sort of thinking was very much on my mind when I was preparing to leave my last job to strike out on my own. As Nassim Taleb discusses in Antifragile, a steady job seems safer than entrepreneurship, but in some ways it’s not. With one big client, i.e. an employer, you are less exposed to small risks but more exposed to big risks. Your income doesn’t vary per month, unless it suddenly drops to zero.

In addition to looking for good clients, Ogilvy shares several stories of letting go of bad clients. I have yet to resign from a bad client—I haven’t had any bad clients—but I value the option to do so. The option to resign from a project makes it less likely that you’ll find yourself in a project you wish to resign from.

Formulating applied math problems

Somewhere in school I got the backward idea that solving math problems is hard but that formulating them is easy. I don’t know if anybody ever said that to me. Maybe it was just implied by years of solving problems someone else had formulated.

A related wrong idea that I also picked up was that formulating math problems was not a mathematician’s responsibility. Someone, probably an engineer, would formulate the problem and hand it over to a mathematician. That happens occasionally, but that’s not how it usually works.

Formulating problems is hard, and it’s usually the applied mathematician’s responsibility, ideally with generous input from a domain area expert.

There are a lot of ways to turn a real world problem into a math problem, and maybe several of them would be adequate for the task at hand. Then you might as well choose the easiest one to understand and compute. Knowing several ways to formulate a problem increases your chances of find one approach that’s tractable. Particularly when you can determine what problem really needs to be solved, not just the problem you first see, you might give yourself more options for how to go about it.

Applied mathematicians don’t need to be an expert in every area of application, and of course cannot be. But they do need to meet clients half way (or more). They need to know something about the problem domain. They need to listen well and need to ask good questions. The questions help the mathematician get going, and they may also give the client something new to think about.