A conversation this morning prompted the question of how many Twitter accounts have between 10,000 and 20,000 followers. I hadn’t thought about the distribution numbers of followers in a while and was curious to revisit the topic.

Apparently this question was more popular five years ago. When I did a few searches on the topic, all the results I got back were five or more years old. Also, data about the Twitter network was more available then that it is now.

This post will come up with an estimate based on almost no data, approaching this like a Fermi problem.

## Model

I’m going to assume that the number of followers for the *n*th most followed account is

*f* = c* n*^{−α}

for some constants *c* and α. A lot of networks approximately follow some distribution like this, and often α is somewhere between 1 and 3.

## Two data points

I’ve got two unknowns, so I need two data points. Wikipedia has a list of the 50 most followed Twitter accounts here, and the 50th account has 37.7 million followers. (I chose the 50th account on the list rather than a higher one because power laws typically fit better after the first few data points.)

I believe that a few years ago the median number of Twitter followers was 0: more than half of Twitter accounts had no followers. Let’s assume the median number of followers is 1. But median out of what? I think I read there are about 350 million total Twitter accounts, and about 200 million active accounts. So should we base our median out of 350 accounts or 200 accounts? We’ll split the difference and assume the median account is the 137,500,000th most popular account.

## Solve for parameters

So now we have two equations:

*c* 50^{−α} = 37,700,000

*c* 137500000^{−α} = 1

and taking logs gives us two linear equations in two unknowns. We solve this to find α = 1.1 and *c* = 2.9 × 10^{9}. The estimate of α is about the size we expected, so that’s a good sign.

**Take this with a grain of salt**. We’ve guessed a very simple model and fit it with just two data points, one of which we based on an educated guess.

## Final estimate

We assumed *f* = c* n*^{−α} and we can solve this for *n*. If an account has *n* followers, we’d estimate its rank *n* as

*n* = (*c* / *f*)^{1/α}.

So we’d estimate the number of accounts with between 10,000 and 20,000 followers to be

(*c* / 10000)^{1/α} − (*c* /20000)^{1/α}.

which is about 40,000. I expect this final estimate is not bad, say within an order of magnitude, despite all the crude assumptions made along the way.

I kind of expected some sensitivity analysis, such as:

(1) Do the 25th and 10th most popular accounts have something like the predicted number of followers? If not, maybe the “top few are strange” problem extends out to 50 as well.

(2) What if the median follower count is still zero? What if 80% still have zero? How much does that change the slope? It it changes α from 1.1 to 1.099, no big deal, but if it changes α to 5, it should be a warning that more data is sorely needed.

I did think about doing a sensitivity analysis, but I decided to go ahead and post it. Sensitivity is left as an exercise for the reader.