The Gini coefficient, a.k.a. Gini index, of a set of numbers is the average of all differences divided by twice the mean. Specifically, let

Then the Gini coefficient of *x* is defined to be

where μ is the mean of the set. The Gini coefficient is often used in economics to measure inequalities in wealth.

Now suppose the data is divided into *r* disjoint groups:

We would like to estimate the Gini coefficient of the entire group from Gini coefficients of each subgroup. This individual Gini coefficients alone are not enough data for the task, but if we also know the size and sum of each subgroup, we can compute lower bounds on *G*. The paper [1] gives five such lower bounds.

We will present the five lower bounds and see how well each does in a simulation.

## Zagier’s lower bounds

Here are Zagier’s five lower bounds, listed in Theorem 1 of [1].

Here *n*_{i} is the size of the *i*th subgroup and *X*_{i} is the sum of the elements in the *i*th subgroup. Also, *n* is the sum of the *n*_{i} and *X* is the sum of the *X*_{i}.

*G*_{0} is the Gini coefficient we would get if we replaced each subgroup with its mean, eliminating all variance within subgroups.

## Simulation

I drew 102 samples from a uniform random variable and computed the Gini coefficient with

def gini(x): n = len(x) mu = sum(x)/n s = sum(abs(a-b) for a in x for b in x) return s/(2*mu*n**2)

I split the sample evenly into three subgroups. I then sorted the list of samples and divided into three even groups again.

The Gini coefficient of the entire data set was 0.3207. The Gini coefficients of the three subgroups were 0.3013, 0.2798, and 0.36033. When I divided the sorted data into three groups, the Gini coefficients were 0.3060, 0.0937, and 0.0502. The variation in each group is the same, but the smallest group has a smaller mean and thus a larger Gini coefficient.

When I tested Zagier’s lower bounds on the three unsorted partitions, I got estimates of

[0.3138, 0.3105, 0.3102, 0.3149, 0.1639]

for the five estimators.

When I repeated this exercise with the sorted groups, I got

[0.1499, 0.0935, 0.0933, 0.1937, 0.3207]

The bounds for the first four estimates were much better for the unsorted partition, but the last estimate was better for the sorted partition.

## More posts on inequalities

- Reversed Cauchy-Schwartz inequality
- Improving on Chebyshev’s inequality
- The baseball inequality
- Banks, Mortgages, and Jensen’s inequality

[1] Don Zagier. Inequalities for the Gini coefficient of composite populations. Journal of Mathematical Economics 12 (1983) 102–118.