Programming language popularity

Here are two ways of measuring programming language popularity:

Rank by number of questions tagged with that language on Stack Overflow
Rank by number of project on GitHub using that language

According to this article, these two measures are well correlated.

I’d be skeptical of either metric by itself. A large number of questions on a language could indicate that it’s poorly documented, for example, rather than popular. And GitHub projects may not representative. But the two measures give similar pictures of the programming language landscape, so together they have more credibility. On the other hand, both measures are probably biased in favor newer languages.

The RedMonk Programming Language Rankings: February 2012

9 thoughts on “Programming language popularity”

Colin

9 February 2012 at 06:31

I’d be wary of using that Github metric. A few months ago I had a Drupal project on Github. Drupal uses abnormal file extensions for a lot of its PHP files like .module and .inc. Github read the project as 75% JavaScript. Maybe there really is all that JavaScript in there, but I doubt it.

James McKay

9 February 2012 at 07:32

Here’s an interesting take on these two metrics: supply versus demand. Languages that are popular on Github will have a strong supply of skills, whereas popularity on StackOverflow would have a stronger correlation to demand for those skills. So for example it would be easier to get a job in C# (which is above the line) than in Ruby or CoffeeScript (which are below the line).

It would be interesting to see how these metrics correlate in practice with statistics from recruitment websites.

stephen o'grady

9 February 2012 at 07:45

A related, interesting tidbit: the correlation between GitHub and Stack Overflow is getting stronger. When Conway originally performed the analysis in 12/10, the ranking correlation was .78. In September of last year, it was up to .79.

This go around? .84.

And as for the suggestion above at looking for correlations with recruitment data, I’d love to explore that, but none of the sites I can find actually expose hard data, only graphs. If anyone has suggestions as to where that data is obtainable – or better, has that data themselves – please do get in touch.

Mr Being

9 February 2012 at 08:10

Maybe people who use github are more likely to use stack overflow and vice versa – but this isn’t a representative sample by any means.

stephen o'grady

9 February 2012 at 08:19

@Mr Being: I thought that’d go without saying, but that’s certainly correct. All of the deployed COBOL/Assembly/etc, for example, is opaque to these metrics. This is a measurement we believe to be representative of trending and direction, not actual worldwide deployment.

@Colin: There are certainly issues, but I know the GitHub guys work hard to continually improve the language regular expression matching against their repos. Like most measurements, this one’s imperfect, but it generally passes the sniff test here.

Rick Wicklin

9 February 2012 at 10:16

In addition to the new/old bias, these measured are biased towards languages that aren’t backed by a commercial company. The measures don’t really apply for commercial languages like Mathematica, SAS, and MATLAB. (Although I notice that MATLAB is on the list, probably due to it’s heavy use in certain academic fields.) Commercial companies usually sponsor their own Discussion Forums, so people who use these languages don’t often post to StackOverflow. Furthmore, GitHub is not used much by business users. SAS is used by most banks, insurance companies, retailers, pharmaceutical companies, and so forth, but these companies are not going to post their proprietary analyses on GitHub.

Chris Barts

9 February 2012 at 12:13

And then there are languages like COBOL, RPG, APL, AutoLisp, and any number of others that simply don’t make it into ‘the wild’ very often, if at all; they’re an invisible culture, a silent majority (or, at least, plurality) of infrastructure in business, engineering, banking, and financial companies that these metrics will likely never capture.

John Stasko

10 February 2012 at 11:18

Actually, the two are correlated probably because of which languages have the most “free” or “open source” libraries written for them.

-> free and open source code is less likely to be documented well, so there are more questions on s.o.
-> free and open source code is more likely to be used for projects which are free and open source, hence posted on github

To support these statements, I’d posit that Ada has an enormous code base, probably much bigger than Ruby, and has quite a following and active development.

The other confounder is that there is a large group of people who dislike s.o. with its childish badges, point system, and fascistic moderators. I don’t know anyone who dislikes GitHub, although I’m sure such people exist.

Roman Shapovalov

12 February 2012 at 11:00

Interesting to compare the results to TIOBE index:
http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html

You people are right, there is bias against old, as well as commercial, languages (Delphi, Visual Basic; C as old), while Shell is typical non-commercial example. It would be logical if newer languages are often asked about on SO, but they are also popular on github. Relatively new Python and Ruby are the leaders, though in the middle of TIOBE table and SO rating.

Another interesting bias is that languages popular in academic environment (R, Matlab, F#) are popular on SO (researchers are active discussers), but under-represented on github (researchers are used to share archives rather than version control, which is sad).

Comments are closed.