Pareto’s 80-20 rule

Vilfredo Pareto

Pareto’s 80-20 rule says that 80% of your results often come from 20% of your effort. Maybe 80% of your profit comes from 20% of your customers, or maybe 80% of the bugs in your software are removed in the first 20% of the time you spend debugging.

The rule is named after Italian economist Vilfredo Paretowho observed that 80% of his country’s land belonged to 20% of its population. The exact ratio of 80-20 isn’t important, though it is surprisingly common. The same principle applies whenever a large majority of effects come from a small number of causes.

The 80-20 rule, or Pareto principle, is startling the first time you hear it. It suggests you can be a lot more productive by focusing your effort where it does the most good. For example, there may be 100,000 to 1,000,000 words in English, depending on how you count them. But you could be pretty fluent in English by knowing the 1,000 most common words.

The thousand most frequently used words in any language are far more important than all the rest combined. Studying these words first makes much more sense than a uniformitarian approach, going through a dictionary in alphabetic order on the assumption that all words are equally important.

I’ve thought about the Pareto principle off and on for many years. When I bring it up for discussion, people are often defensive, bringing up the same objections every time.

Objections

The most common objection is the recursive argument. If you could be more effective by focusing on the 20% that’s most important, then you should do that again: focus on the 20% of the 20% that’s most important. Apply this argument repeatedly and you can be infinitely productive with no effort.

The recursive argument takes the “80” and “20” of the 80-20 rule too literally. The point is not the exact ratios. The point is that return on effort invested is not uniformly distributed. In fact, it’s often far from uniformly distributed. I prefer the term Pareto principle to “80-20 rule” just because it does not reference particular numbers that could distract from the general principle.

Could you apply a Pareto principle recursively to English words, say by focusing on the 200 most common words? In fact your could. But that doesn’t mean that you could keep doing this repeatedly, learning only the most common word (“the”) and declaring yourself fluent in English. This doesn’t negate the fact that the importance of English words is very unevenly distributed.

Another objection is the completionist argument. It says that everything has to be done, so the fact that you get less return on some things than others doesn’t matter. For example, the letters E, T, and A appear about 100 times as often as J, Q, and Z. That doesn’t mean you could leave J, Q, and Z off your keyboard. On the other hand, it does mean that you might design a keyboard so that E, T, and A are easier to reach than J, Q, and Z. And Samuel Morse was smart to assign his shortest codes to the most frequently used letters. [1]

A final objection is the ignorance argument: we simply don’t what the most effective 20% will be beforehand. This is a serious objection, and it should temper our optimism regarding the Pareto principle. If a salesman knew which 20% of his prospects were going to buy, he should just sell to them. But of course he doesn’t know ahead of time who those 20% will be. On the other hand, he has some idea who is likely to buy (and how much they may buy) and doesn’t approach prospects randomly.

These objections take the Pareto principle to extremes to justify disregarding it. Since you can’t repeatedly apply it indefinitely, there must be nothing to it. Or if you can’t completely eliminate the least productive work, you should treat everything equally. Or if you don’t have absolute certainty regarding what’s most important, you shouldn’t consider what’s likely to be most important.

Applications

Despite the objections above, it is true that returns on effort are often very unevenly distributed. There’s a common tendency to under estimate the variance [2]. We might have a rough idea how effective a list of possible actions would be, and maybe imagine than the most effective choice would be ten times better than the least effective choice, but in fact the ratio might be a hundred to one or even a thousand to one [3]. Somehow we mentally compress these ratios, maybe on something like a logarithmic scale.

So one key to taking advantage of the Pareto principle is simply to keep in mind that something like the Pareto principle might hold. You’re not likely to find a Pareto rule if you don’t think they exist.

Another key is to be honest with ourselves regarding how effective we want to be. Maybe the most effective thing to do is something we simply don’t want to do. If so, we can either make a principled decision to not do what we know to be more effecitve, or get over our sloth.

I mentioned ignorance above. “Uncertainty” is a more helpful word than “ignorance” here because we’re not often completely ignorant. We usually have some idea which actions are more likely to be effective. Data can help. Start by using whatever information or intuition you have, and update it as you gather data.

This could be a formal Bayesian process if you have quantifiable data. Or it could be as simple as just trying something. If it works, try it again. If not, try something different. You may be able to bootstrap this “play the winner” strategy until you have enough data to be more formal about making decisions.

***

[1] How well does Morse code symbol length correspond to frequency? I looked into that here.

[2] I have a friend who has helped me with this. He will suggest I do X, and I agree, but say I’d rather do Y. Then he will reply with something like “Sure, you could do that. But X could be a thousand times more effective. It’s up to you.” I’ve done the same for others. It’s easier to see someone else’s decisions objectively than your own.

[3] This is not an exaggeration. I’ve seen this, for example, in software optimization. Some changes might make 1,000x more of a difference than others.

Objectives and constraints

Objectives and constraints are symmetrical in a mathematical sense but are asymmetrical in a psychological sense. By taking dual formulations, you can reverse the mathematical role of objectives and constraints, but in application objectives are more obvious than constraints.

In the question “What is the minimum value of x² over the interval [1, 5]?” the function f(x) = x² is the objective function and 1 ≤ x ≤ 5 is the constraint. If someone says the minimum is 0, they’ve minimized the objective function but ignored the constraint. This is clear in a such a simple problem, but failure to consider constraints can be much more subtle.

Objectives tend to be easily quantifiable—maximize profit, minimize energy consumption, etc.— but constraints tend to be less quantifiable—the solution has to be testable and maintainable, has to be legal, has to be something people will buy or vote for, etc.

When children ask “Why don’t you just …” it’s because they see a way to improve some objective, but the “just” part shows that they are either completely unaware of a relevant constraint or are unaware of how difficult it would be to overcome the constraint. As you mature, you become aware of more constraints. You realize that things that seem grossly subopitmal are actually close to optimal when you consider the necessary constraints. There may be room for improvement, but not as much as you imagined and at a higher cost.

Big opportunities open up when constraints change. Maybe an idea was abandoned because it would require more calculation than anyone could carry out by hand, and now’s the time to revisit it. Or maybe an idea was never developed because it would require instantaneous communication between people at multiple points on the globe. No problem now.

In both the examples above, a constraint was relaxed: computation and communication have gotten far less expensive. Increased constraints create opportunities as well. When the price of something goes up, its alternatives become more economical by comparison. Whether an oil field is worth developing, for example, depends on the current price of oil.

If I ask “Why hasn’t someone done this before?” I’m skeptical if the answer is “Because I’m smarter than everyone else who has tried.” But if the answer is “Because constraints have changed” then I’m much more receptive.

Related post: Boundary conditions are the hard part

Dividing projects into math, statistics, and computing

If you’ve read this blog for long, you know that my work is a combination of math, statistics, and computing.

I was looking over my records and tried to see how my work divides into these three areas. In short, it doesn’t.

The boundaries between these areas are fuzzy or arbitrary to begin with, but a few projects fell cleanly into one of the three categories. However, 85% of my income has come from projects that involve a combination of two areas or all three areas.

If you calculate a confidence interval using R, you could say you’re doing math, statistics, and computing. But for the accounting above I’d simply call that statistics. When I say a project uses math and computation, for example, I mean it requires math outside what is typical in programming, and programming outside what is typical in math.

Example of the bike shed principle

Celebration, Florida town seal

One of the case studies in Michael Beirut’s book How to is the graphic design for the planned community Celebration, Florida. The logo for the town’s golf course is an illustration of the bike shed principle.

C. Northcote Parkinson observed that it is easier for a committee to approve a nuclear power plant than a bicycle shed. Nuclear power plants are complex, and no one on a committee presumes to understand every detail. Committee members must rely on the judgment of others. But everyone understands bicycle sheds. Also, questions such as what color to paint the bike shed don’t have objective answers. And so bike sheds provoke long discussions.

People argue about bike sheds because they understand bike sheds. Beirut said something similar about the Celebration Golf Club logo which features a silhouette of a golfer.

Designing the graphics for Celebration’s public golf club was much harder than designing the town seal. It took me some time to realize why: none of our clients were Schwinn-riding, polytailed girls [as in the town seal], but most of them were enthusiastic golfers. The silhouette on the golf club design was refined endlessly as various executives demonstrated their swings in client meetings.

Image credit: By Source, Fair use, https://en.wikipedia.org/w/index.php?curid=37643922

Natural growth

Interesting passage from Small is Beautiful: Economics as if People Mattered by E. F. Schumacher:

Nature always, so to speak, knows where and when to stop. There is a measure in all natural things—in their size, speed, or violence. As a result, the system of nature, of which man is a part, tends to be self-balancing, self-adjusting, self-cleansing. Not so with technology, or perhaps I should say: not so with man dominated by technology and specialization. Technology recognizes no self-limiting principle …

We speak of natural growth more often than natural limits to growth. Maybe we should consider the latter more often.

Schumacher’s book was written in 1973 and seems to embody some of the hippie romanticism of its day. That does not make its arguments right or wrong, but it shows what some of the author’s influences were.

The book’s back cover has an endorsement describing Schumacher as “eminently practical, sensible, … versant in the subtleties of large-scale business management …” I haven’t read the whole book, only parts here and there, but the romantic overtones stand out more to me, maybe because they contrast more with the contemporary atmosphere. When the book was published, maybe the pragmatic overtones stood out more.

Optimal team size

Kevlin Henney’s keynote at GOTO Copenhagen this year discussed how project time varies as a function of the number of people on the project. The most naive assumption is that the time is inversely proportional to the number of people. That is

t = W/n

where t is the calendar time to completion, W is a measure of how much work is to be done, and n is the number of people. This assumes everything on the project can be done in parallel. Nobody waits for anybody else.

The next refinement is to take into account the proportion of work that can be done in parallel. Call this p. Then we have

t = W[1 – p(n-1)/n].

If everything can be done in parallel, p = 1 and tW/n as before. But if nothing can be done in parallel, p= 0, and so tW. In other words, the total time is the same whether one person is on the project or more. This is essentially Amdahl’s law.

With the equation above, adding people never slows things down. And if p > 0, every addition person helps at least a little bit.

Next we add a term to account for communication cost. Assume communication costs are proportional to the number of communication paths, n(n – 1)/2. Call the proportionality constant k. Now we have

t = W[1 – p(n-1)/n + kn(n-1)/2].

If k is small but positive, then at first adding more people causes a project to complete sooner. But beyond some optimal team size, adding more people causes the project to take longer.

Of course none of this is exact. Project time estimation doesn’t follow any simple formula. Think of these equations more as rough guides or metaphors. It’s certainly true that beyond a certain size, adding more people to a project can slow the project down. Kevlin gave examples of projects that were put back on track by reducing the number of people working on them.

My quibble with the equation above is that I don’t think the cost of more people is primarily communication. Communication paths in a real project are not the simple trees of org charts, but neither are they complete graphs. And if the problem were simply communication, then improved communication would mitigate the cost of adding people to a project, though I imagine it hardly does.

I think the cost of adding people to a project has more to do with Parkinson’s Law which says that people make work for each other. (The aphorism form of Parkinson’s Law says that work expands to the time allowed. But the eponymous book explains why work expands, and it is in part because people make extra work for each other.)

Dust jacket of the book Parkinsons Law and Other Studies in Administration

I wrote about a similar theme in the blog post Maybe you only need it because you have it. Here’s the conclusion of that post:

Suppose a useless project adds staff. These staff need to be managed, so they hire a manager. Then they hire people for IT, accounting, marketing, etc. Eventually they have their own building. This building needs security, maintenance, and housekeeping. No one questions the need for the security guard, but the guard would not have been necessary without the original useless project.

When something seems absolutely necessary, maybe it’s only necessary because of something else that isn’t necessary.

Grateful for failures

old saxophone

I’ve been thinking lately about different things I’ve tried that didn’t work out and how grateful I am that they did not.

The first one that comes to mind is my academic career. If I’d been more successful with grants and publications as a postdoc, it would have been harder to decide to leave academia. I’m glad I left when I did.

When I was in high school I was a fairly good musician. At one point decided that if I made the all-state band I would major in music. Thank God I didn’t make it.

I’ve looked back at projects that I hoped to get, and then realized how it’s a good thing that they didn’t come through.

In each of these examples, I’ve been forced to turn away from something I was moderately good at to pursue something that’s a better fit for me.

I wonder what failure I’ll be grateful for next.

 

Selecting clients

One of the themes in David Ogilvy’s memoir Confessions of an Advertising Man is the importance of selecting good clients. For example, he advises “never take associations as clients” because they have “too many masters, too many objectives, too little money.”

He also recommends not taking on clients that are so large that you would lose your independence and financial robustness by taking them on.

I have never wanted to get an account so big that I could not afford to lose it. The day you do that, you commit yourself to living with fear. Frightened agencies lose the courage to give candid advice; once you lose that you become a lackey.

This is what lead me to refuse an invitation to compete for the Edsel account. I wrote to Ford: “Your account would represent one-half of our total billing. This would make it difficult for us to sustain our independence of counsel.” If we had entered the Edsel contest, and if we had won it, Ogilvy, Benson & Bather would have gone down the drain with Edsel.

This sort of thinking was very much on my mind when I was preparing to leave my last job to strike out on my own. As Nassim Taleb discusses in Antifragile, a steady job seems safer than entrepreneurship, but in some ways it’s not. With one big client, i.e. an employer, you are less exposed to small risks but more exposed to big risks. Your income doesn’t vary per month, unless it suddenly drops to zero.

In addition to looking for good clients, Ogilvy shares several stories of letting go of bad clients. I have yet to resign from a bad client—I haven’t had any bad clients—but I value the option to do so. The option to resign from a project makes it less likely that you’ll find yourself in a project you wish to resign from.

Formulating applied math problems

Somewhere in school I got the backward idea that solving math problems is hard but that formulating them is easy. I don’t know if anybody ever said that to me. Maybe it was just implied by years of solving problems someone else had formulated.

A related wrong idea that I also picked up was that formulating math problems was not a mathematician’s responsibility. Someone, probably an engineer, would formulate the problem and hand it over to a mathematician. That happens occasionally, but that’s not how it usually works.

Formulating problems is hard, and it’s usually the applied mathematician’s responsibility, ideally with generous input from a domain area expert.

There are a lot of ways to turn a real world problem into a math problem, and maybe several of them would be adequate for the task at hand. Then you might as well choose the easiest one to understand and compute. Knowing several ways to formulate a problem increases your chances of find one approach that’s tractable. Particularly when you can determine what problem really needs to be solved, not just the problem you first see, you might give yourself more options for how to go about it.

Applied mathematicians don’t need to be an expert in every area of application, and of course cannot be. But they do need to meet clients half way (or more). They need to know something about the problem domain. They need to listen well and need to ask good questions. The questions help the mathematician get going, and they may also give the client something new to think about.

Consulting for consultants

They say that doctors make terrible patients, but in my experience consultants make great consulting clients. The best are confident in their own specialization and respect you in yours. They get going quickly and pay quickly. (I’ve only worked for consultants who have small companies. I imagine large consulting companies are as slow as other companies the same size.)

Sometimes consultants working in software development will ask me to help out with some mathematical part of their projects. And sometimes math/stat folks will ask me to help out with some computational part of their projects.

I started my consulting business three years ago. Since then I’ve gotten to know a few other consultants well. This lets me offer a broader range of services to a client by bringing in other people, and sometimes it helps me find projects.

If you’re a consultant and interested in working together, please send me an email introducing yourself. I’m most interested in meeting consultants who have some overlap with what I do but who also have complementary strengths.

Compressing ten years into six months

The other day I ran across a line from Peter Thiel saying that if you have a plan for where you’d like to be in ten years, ask yourself if you could get there in six months.

I don’t think he’s simply saying see if you can do everything 20 times faster. If you estimate something will take ten days, it probably will take more than half a day. We’re better at estimating things on the scale of days than on the scale of years.

If you expect to finish a project in ten days, you’re probably going to go about it the way you’ve approached similar projects before. There’s not a lot of time for other options. But there are a lot of ways to go about a decade-long project.

Since Thiel is well known for being skeptical of college education, I imagine one of the things he had in mind was starting a company in six months rather than going to college, getting an entry level job, then leaving to start your company.

As I wrote in an earlier post, some things can’t be done slowly.

Some projects can only be done so slowly. If you send up a rocket at half of escape velocity, it’s not going to take twice as long to get where you want it to go. It’s going to take infinitely longer.

Some projects have to be done quickly if they are going to be done at all. Software projects are usually like this. If a software project is expected to take two years, I bet it’ll take five, if it’s not cancelled before then. You have to deliver software faster than the requirements change. Of course this isn’t unique to software. To be successful, you have to deliver any project before your motivation or your opportunity go away.

Overestimating the competition

Richard Feynman tells a story in Surely You’re Joking, Mr. Feynman that I’m reminded of periodically when I realize something is smaller and less sophisticated than I imagined.

[Update: A couple people pointed out in the comments that I got the roles of the two characters in this story reversed, so I’ve corrected this.]

Feynman tells the story of a colleague at Los Alamos, Frederic de Hoffman, describing his company’s attempt to plate plastics with metal. De Hoffman said that his company was making progress, but gave up when he saw that another company, Metaplast Corporation, was apparently way ahead of them, based on Metaplast’s advertising. Feynman had worked at Metaplast a few years earlier, but didn’t tell de Hoffman immediately.

Feynman asked de Hoffman how many chemists he thought Metaplast had.

“I would guess they must have twenty‑five or fifty chemists … How the hell could we compete with them?”

Feynman told de Hoffman “You’ll be interested and amused to know that you are now talking to the chief research chemist of the Metaplast Corporation, whose staff consisted of one bottle‑washer!”

I don’t think Feynman was trying to gloat that he was smarter than the staff of chemists at de Hoffman’s company, though he may have been. Feynman knew all the problems his company had and the times they screwed up. They projected a more confident image in their advertising, and the competition bought it.

Learning (needlessly) hard technology

A few years ago, a friend told me he was thinking about learning a certain technology because it was really hard to use. This was not something that had to be complex to solve a complex problem, but something that was unnecessarily complex. Why would anyone do that?

His reasoning was that as a consultant, he could make good money supporting a technology that’s hard to use. My friend would have more integrity than to recommend something that he didn’t think was a good solution. Perhaps he was thinking of saying something like this to a client: “I wouldn’t recommend this technology if you were starting from scratch. But since you’re invested in it, I’ll help you with it or help you migrate to something else.”

That sounds like an unpleasant way to earn a living. It also sounds risky. If something really is unnecessarily complex, better alternatives are likely to arise, perhaps suddenly. (This assumes people are free to choose alternatives, not prohibited by law, for example.)

Learning a technology that’s complex for good reasons could be a smart and ethical move. The work is harder at lower levels of abstraction, but someone has to solve the problems others would rather not think about. And since not as many people can do that work, it should pay better and be more secure.

There are a couple dangers, however, associated with choosing a more difficult technology. One is the temptation to use it where it isn’t needed. The other is that the set of problems where it is needed may shrink over time.

Related posts: