Pareto’s 80-20 rule says that 80% of your results often come from 20% of your effort. Maybe 80% of your profit comes from 20% of your customers, or maybe 80% of the bugs in your software are removed in the first 20% of the time you spend debugging.
The rule is named after Italian economist Vilfredo Paretowho observed that 80% of his country’s land belonged to 20% of its population. The exact ratio of 80-20 isn’t important, though it is surprisingly common. The same principle applies whenever a large majority of effects come from a small number of causes.
The 80-20 rule, or Pareto principle, is startling the first time you hear it. It suggests you can be a lot more productive by focusing your effort where it does the most good. For example, there may be 100,000 to 1,000,000 words in English, depending on how you count them. But you could be pretty fluent in English by knowing the 1,000 most common words.
The thousand most frequently used words in any language are far more important than all the rest combined. Studying these words first makes much more sense than a uniformitarian approach, going through a dictionary in alphabetic order on the assumption that all words are equally important.
I’ve thought about the Pareto principle off and on for many years. When I bring it up for discussion, people are often defensive, bringing up the same objections every time.
The most common objection is the recursive argument. If you could be more effective by focusing on the 20% that’s most important, then you should do that again: focus on the 20% of the 20% that’s most important. Apply this argument repeatedly and you can be infinitely productive with no effort.
The recursive argument takes the “80” and “20” of the 80-20 rule too literally. The point is not the exact ratios. The point is that return on effort invested is not uniformly distributed. In fact, it’s often far from uniformly distributed. I prefer the term Pareto principle to “80-20 rule” just because it does not reference particular numbers that could distract from the general principle.
Could you apply a Pareto principle recursively to English words, say by focusing on the 200 most common words? In fact your could. But that doesn’t mean that you could keep doing this repeatedly, learning only the most common word (“the”) and declaring yourself fluent in English. This doesn’t negate the fact that the importance of English words is very unevenly distributed.
Another objection is the completionist argument. It says that everything has to be done, so the fact that you get less return on some things than others doesn’t matter. For example, the letters E, T, and A appear about 100 times as often as J, Q, and Z. That doesn’t mean you could leave J, Q, and Z off your keyboard. On the other hand, it does mean that you might design a keyboard so that E, T, and A are easier to reach than J, Q, and Z. And Samuel Morse was smart to assign his shortest codes to the most frequently used letters. 
A final objection is the ignorance argument: we simply don’t what the most effective 20% will be beforehand. This is a serious objection, and it should temper our optimism regarding the Pareto principle. If a salesman knew which 20% of his prospects were going to buy, he should just sell to them. But of course he doesn’t know ahead of time who those 20% will be. On the other hand, he has some idea who is likely to buy (and how much they may buy) and doesn’t approach prospects randomly.
These objections take the Pareto principle to extremes to justify disregarding it. Since you can’t repeatedly apply it indefinitely, there must be nothing to it. Or if you can’t completely eliminate the least productive work, you should treat everything equally. Or if you don’t have absolute certainty regarding what’s most important, you shouldn’t consider what’s likely to be most important.
Despite the objections above, it is true that returns on effort are often very unevenly distributed. There’s a common tendency to under estimate the variance . We might have a rough idea how effective a list of possible actions would be, and maybe imagine than the most effective choice would be ten times better than the least effective choice, but in fact the ratio might be a hundred to one or even a thousand to one . Somehow we mentally compress these ratios, maybe on something like a logarithmic scale.
So one key to taking advantage of the Pareto principle is simply to keep in mind that something like the Pareto principle might hold. You’re not likely to find a Pareto rule if you don’t think they exist.
Another key is to be honest with ourselves regarding how effective we want to be. Maybe the most effective thing to do is something we simply don’t want to do. If so, we can either make a principled decision to not do what we know to be more effecitve, or get over our sloth.
I mentioned ignorance above. “Uncertainty” is a more helpful word than “ignorance” here because we’re not often completely ignorant. We usually have some idea which actions are more likely to be effective. Data can help. Start by using whatever information or intuition you have, and update it as you gather data.
This could be a formal Bayesian process if you have quantifiable data. Or it could be as simple as just trying something. If it works, try it again. If not, try something different. You may be able to bootstrap this “play the winner” strategy until you have enough data to be more formal about making decisions.
 How well does Morse code symbol length correspond to frequency? I looked into that here.
 I have a friend who has helped me with this. He will suggest I do X, and I agree, but say I’d rather do Y. Then he will reply with something like “Sure, you could do that. But X could be a thousand times more effective. It’s up to you.” I’ve done the same for others. It’s easier to see someone else’s decisions objectively than your own.
 This is not an exaggeration. I’ve seen this, for example, in software optimization. Some changes might make 1,000x more of a difference than others.