Inequalities for inequality: Gini coefficient lower bounds

The Gini coefficient, a.k.a. Gini index, of a set of numbers is the average of all differences divided by twice the mean. Specifically, let

{\cal X} = \{x_1, x_2, x_3, \ldots, x_n\}

Then the Gini coefficient of x is defined to be

G({\cal X}) = \frac{1}{2\mu n^2} \sum_{i=1}^n \sum_{j=1}^n |x_i - x_j|
where μ is the mean of the set. The Gini coefficient is often used in economics to measure inequalities in wealth.

Now suppose the data is divided into r disjoint groups:

{\cal X} = \bigcup_{i = 1}^r {\cal X}_i

We would like to estimate the Gini coefficient of the entire group from Gini coefficients of each subgroup. This individual Gini coefficients alone are not enough data for the task, but if we also know the size and sum of each subgroup, we can compute lower bounds on G. The paper [1] gives five such lower bounds.

We will present the five lower bounds and see how well each does in a simulation.

Zagier’s lower bounds

Here are Zagier’s five lower bounds, listed in Theorem 1 of [1].

\begin{align*} G({\cal X}) &\geq \sum_{i=1}^r \frac{n_i}{n} G({\cal X}_i) \\ G({\cal X}) &\geq \sum_{i=1}^r \frac{X_i}{X} G({\cal X}_i) \\ G({\cal X}) &\ge \left(\sum_{i=1}^r \sqrt{\frac{n_i}{n} \frac{X_i}{X} G({\cal X}_i)} \right)^2 \\ G({\cal X}) &\geq 1 - \left(\sum_{i=1}^r \sqrt{\frac{n_i}{n} \frac{X_i}{X} (1- G({\cal X}_i))} \right)^2 \\ G({\cal X}) &\geq G_0 = \sum_{i=1}^r \frac{n_i}{n} \frac{X_i}{X} G({\cal X}_i) \end{align*}

Here ni is the size of the ith subgroup and Xi is the sum of the elements in the ith subgroup. Also, n is the sum of the ni and X is the sum of the Xi.

G0 is the Gini coefficient we would get if we replaced each subgroup with its mean, eliminating all variance within subgroups.

Simulation

I drew 102 samples from a uniform random variable and computed the Gini coefficient with

    def gini(x):
        n = len(x)
        mu = sum(x)/n    
        s = sum(abs(a-b) for a in x for b in x)
        return s/(2*mu*n**2)

I split the sample evenly into three subgroups. I then sorted the list of samples and divided into three even groups again.

The Gini coefficient of the entire data set was 0.3207. The Gini coefficients of the three subgroups were 0.3013, 0.2798, and 0.36033. When I divided the sorted data into three groups, the Gini coefficients were 0.3060, 0.0937, and 0.0502. The variation in each group is the same, but the smallest group has a smaller mean and thus a larger Gini coefficient.

When I tested Zagier’s lower bounds on the three unsorted partitions, I got estimates of

[0.3138, 0.3105, 0.3102, 0.3149, 0.1639]

for the five estimators.

When I repeated this exercise with the sorted groups, I got

[0.1499, 0.0935, 0.0933, 0.1937, 0.3207]

The bounds for the first four estimates were much better for the unsorted partition, but the last estimate was better for the sorted partition.

More posts on inequalities

[1] Don Zagier. Inequalities for the Gini coefficient of composite populations. Journal of Mathematical Economics 12 (1983) 102–118.

Economics, power laws, and hacking

Increasing costs impact some players more than others. Those who know about power laws and know how to prioritize are impacted less than those who naively believe everything is equally important.

This post will look at economics and power laws in the context of password cracking. Increasing the cost of verifying a password does not impact attackers proportionately because the attackers have a power law on their side.

Key stretching

In an earlier post I explained how key stretching increases the time required to verify a password. The idea is that if authentication systems can afford to spend more computation time verifying a password, they can force attackers to spend more time as well.

The stretching algorithm increases the time required to test a single salt and password combination by a factor of r where r is a parameter in the algorithm. But increasing the amount of work per instance by a factor of r does not decrease the number of users an attacker can compromise by a factor of r.

Power laws

If an attacker is going through a list of compromised passwords, the number of user passwords recovered for a fixed amount of computing power decreases by less than r because passwords follow a power law distribution [1]. The nth most common password appears with frequency proportional to ns for some constant s. Studies suggest s is between 0.5 and 0.9. Let’s say s = 0.7.

Suppose an attacker has a sorted list of common passwords. He starts with the most common password and tries it for each user. Then he tries the next most common password, and keeps going until he runs out of resources. If you increase the difficulty of testing each password by r, the attacker will have to stop after trying fewer passwords.

Example

Suppose the attacker had time to test the 1000 most common passwords, but then you set r to 1000, increasing the resources required for each individual test by 1000. Now the attacker can only test the most common password. The most common password accounts for about 1/24 of the users of the 1000 most common passwords. So the number of passwords the attacker can test went down by 1000, but the number of users effected only went down by 24.

It would take 1000 times longer if the attacker still tested the same number of passwords. But the attacker is hitting diminishing return after the first password tested. By testing passwords in order of popularity, his return on resources does not need to go down by a factor of 1000. In this example it only went down by a factor of 24.

How much the hacker’s ROI decreases depends on several factors. It never decreases by more than r, and often it decreases less.

Related posts

[1] Any mention of power laws brings out two opposite reactions, one enthusiastic and one pedantic. The naive reaction is “Cool! Another example of how power laws are everywhere!” The pedantic reaction is “Few things actually follow a power law. When people claim that data follow a power law distribution the data usually follow a slightly different distribution.”

The pedantic reaction is technically correct but not very helpful. The enthusiastic reaction is correct with a little nuance: many things approximately follow a power law distribution over a significant range. Nothing follows a power law distribution exactly and everywhere, but the same is true of any probability distribution.

It doesn’t matter for this post whether data exactly follow a power law. This post is just a sort of back-of-the-envelope calculation. The power law distribution has some empirical basis in this context, and it’s a convenient form for calculations.

Window dressing for ideological biases

From Russ Roberts on the latest EconTalk podcast:

… this is really embarrassing as a professional economist — but I’ve come to believe that there may be no examples … of where a sophisticated multivariate econometric analysis … where important policy issues are at stake, has led to a consensus. Where somebody says: Well, I guess I was wrong. Where somebody on the other side of the issue says: Your analysis, you’ve got a significant coefficient there — I’m wrong. No. They always say: You left this out, you left that out. And they’re right, of course. And then they can redo the analysis and show that in fact — and so what that means is that the tools, instead of leading to certainty and improved knowledge about the usefulness of policy interventions, are merely window dressing for ideological biases that are pre-existing in the case of the researchers.

Emphasis added.

Related post: Numerous studies have confirmed …

Square root of people

How do you infer the economic well-being of individuals from household income? At one extreme, you could just divide household income by the number of people in the household. This is naive because there are some economies of scale. It doesn’t take twice as much energy to heat a house for two people as a house for one person, for example.

The other extreme is the two-can-live-as-cheaply-as-one philosophy. All that matters is household income; the number of people in the house is irrelevant. That’s too simplistic because while there are fixed costs to running a home, there are marginal costs per person.

So if you have a household of n people do you divide by the total income by n or by 1? Said another way, do you divide by n1 or n0? Some economists split the difference and use an exponent of 0.5, i.e. divide by the square root of n. This would say, for example, that the members of a family of four with a total income of $100,000 have roughly the same standard of living as a single person with an income of $50,000.

Dividing by √n is a crude solution. For one thing, it makes no distinction between adults and children. But for something so simple, it makes some sense. It makes more sense than dividing by n or not taking n into account.

Source: Burkhauser on the Middle Class

Convention versus compulsion

An alternate title for this post could be “Software engineering wisdom from a lecture on economics given in 1945.”

F. A. Hayek gave a lecture on December 17, 1945 entitled “Individualism: True and False.” A transcript of the talk is published in his book Individualism and Economic Order. In this talk Hayek argues that societies must decide between convention and compulsion as means to coordinate activity. The former is preferable, in part because it is more flexible. Individualism depends on

… traditions and conventions which evolve in a free society and which, without being enforceable, establish flexible but normally observed rules that make behavior of other people predictable in a high degree.

Of course Hayek wasn’t thinking of software development, but his comments certainly are applicable to software development. Software engineers are fond of flexibility, but suspicious of rules that cannot be enforced by a machine. And yet there are some kinds of flexibility that require traditions and conventions rather than enforceable rules. Hayek looks beyond the letter of the law to the spirit: the purpose of rules in software engineering is to make the behavior of software (and software engineers) “predictable in a high degree.”

I’ve written a couple blog posts on this theme. One was Software architecture as a function of trust:

If you trust that your developers are highly competent and self-disciplined, you’ll organize your software differently than if you assume developers have mediocre skill and discipline. One way this shows up is the extent that you’re willing to rely on convention to maintain order. … In general, I see more reliance on convention in open source projects than in enterprise projects.

Another was a post on the architecture of Emacs:

In short, Emacs expects developers to be self-disciplined and does not enforce a great deal of external discipline. However, because the software is so light on bureaucracy, it is easy to customize and to contribute to.

The quotation from Hayek above continues:

The willingness to submit to such rules, not merely so long as one understands the reason for them but so long as one has no definite reason to the contrary, is an essential condition for the gradual evolution and improvement of rules of social intercourse … an indispensable condition if it is to be possible to dispense with compulsion.

Imagine a rookie programmer who joins a new team and only follows those conventions he fully understands. That’s not much better than the rookie doing whatever he pleases. The real benefit comes from his following the conventions he doesn’t yet understand (provided he “has no definite reason to the contrary”) because these distill the ideas of more experienced developers.

It takes time to pass on a set of traditions and conventions, especially to convey the rationale behind them. Machine-enforceable rules are a shortcut to establishing a culture.

Every project will be somewhere along a continuum between total reliance on convention and total reliance on rules a computer can check. Emacs is pretty far toward the conventional end of the spectrum, and enterprise Java projects are near the opposite end. If you want to move away from the compulsion end of the spectrum, you need more emphasis on convention.

Related post: Style and understanding

How much time do scientists spend chasing grants?

Computer scientist Matt Welsh said that one reason he left Harvard for Google was that he was spending 40% of his time chasing grants. At Google, he devotes all his time to doing computer science. Here’s how he describes it in his blog post The Secret Lives of Professors:

The biggest surprise is how much time I have to spend getting funding for my research. Although it varies a lot, I guess that I spent about 40% of my time chasing after funding, either directly (writing grant proposals) or indirectly (visiting companies, giving talks, building relationships). It is a huge investment of time that does not always contribute directly to your research agenda—just something you have to do to keep the wheels turning.

According to this Scientific American editorial, 40% is typical.

Most scientists finance their laboratories (and often even their own salaries) by applying to government agencies and private foundations for grants. The process has become a major time sink. In 2007 a U.S. government study found that university faculty members spend about 40 percent of their research time navigating the bureaucratic labyrinth, and the situation is no better in Europe.

Not only do scientists on average spend a large amount of time pursuing grants, they tend to spend more time on grants as their career advances. (This has an element of tautology: you advance your career in part by obtaining grants, so the most successful are the ones who have secured the most grant money.)

By the time scientists are famous, they may no longer spend much time actually doing science. They may spend nearly all their research time chasing grants either directly or, as Matt Welsh describes, indirectly by traveling, speaking, and schmoozing.

Pseudo-commons and anti-commons

Here are a couple variations on the tragedy of the commons, the idea that shared resources can be exhausted by people acting in their individual best interests.

The first is a recent podcast by Thomas Gideon discussing the possibility of a tragedy of the pseudo-commons. His idea of a pseudo-commons is a creative commons with some barriers. He gives the example of open core companies.

The other is Michael Heller’s idea of the tragedy of the anti-commons. If too many people own a resource, the difficulties in coordination may keep the resource from being used effectively. Having too many owners can create problems similar to those caused by having no owners.

If you’re looking for something to blog about, it would be interesting to compare the pseudo-commons and the anti-commons in depth.

Related posts

Three P’s and three I’s of economics

In the December 27 episode of EconTalk, Pete Boettke summarizes basic economics as follows: If you don’t have the three P’s, you can’t have the three I’s.

The three P’s are

  • Property
  • Prices
  • Profit and loss

The three I’s are

  • Information
  • Incentive
  • Innovation

Related posts