StackOverflow question statistics

This is the third in a series of posts about StackOverflow statistics. First I wrote about reputation. Then a couple days ago I wrote about users and votes on StackOverflow. Now this post answers three questions about questions.

  1. How many answers do question typically get?
  2. How many votes do questions typically get?
  3. Do questions with more answers also get more votes?

The following graph shows the distribution of answers per question.

histogram

At the time the data were collected, there were almost 126,000 questions. Only 134 of these had no answers. (This seems suspiciously small, but I’ll take it at face value for now.) Around 18% of questions had one answer, 23% had two answers, 18% had three answers, and the number of answers declines steadily from there. The median number of votes was 3, and the average was 4.2. About 98% of questions had 15 or fewer answers. Only 51 questions had more than 100 answers.

But a few questions had an extreme number of answers. The question “How old are you, and how old were you when you started coding?” had the most answers at 1,203. (Usually the StackOverflow moderators shut down frivolous questions, but they let this one go on a while for fun.)

Turning to votes, the following graph shows the distribution of number of votes per question.

About 29% of the questions had not yet received any votes when the data were collected.  About 99% of questions had 16 or fewer votes. The median number of votes was 1 and the mean was 2.45. Only 49 questions had more than 100 votes. (Note that we’re looking at total votes, up and down, not score. Score will typically be lower since votes can be negative.)

As with counting answers, there are a few outliers for votes. The question “What’s your favorite programmer cartoon? ” had the most votes with 630. (Most popular answer: xkcd.)

Finally, how do votes and reputation go together? Do questions with more answers also get more votes and vice versa? There’s definitely an association between number of votes and number of answers. (Technically, Spearman’s rank correlation rho was 0.49.) Answers and votes tend to increase together, though there’s plenty of variation in the number of votes that the number of answers does not explain.

Related posts:

Civic duty on StackOverflow
StackOverflow reputation statistics

Posted in Uncategorized
3 comments on “StackOverflow question statistics
  1. Ben says:

    A question I’ve asked myself is whether there is a ‘Skeet’ effect. Do questions that Jon Skeet (or other high-reputation users) answers get more views (and thus votes?)

  2. John says:

    Ben, I don’t have the data to answer your question, but it seems plausible. Questions asked by high-reputation users might get more views because people recognize their names and are curious about what these folks are asking. Or maybe they’d get more views even if they could ask questions anonymously: presumably people with high reputation are in tune with what other users want to hear about.

  3. It may be the case that questions answered by high-reputation users do get a bit more traffic, but as you implied, it might just be due to correlation, not a causation. I know that Jon Skeet in particular is very active in the C# and .NET tags, the most popular on Stack Overflow, so questions that he answers will naturally get more views due to the popularity of those tags.

    Another possible source of ‘Skeet’ effect may be the RSS feeds available for each user. I’ve had a couple of people tell me they follow my feed, and I can imagine that users who specialize in popular tags amass a following. Jeff may be willing to provide you with data on how many people use the RSS feeds, and which feeds are most popular.