Suppose a large number of people each have a slightly better than 50% chance of correctly answering a yes/no question. If they answered independently, the majority would very likely be correct.

For example, suppose there are 10,000 people, each with a 51% chance of answering a question correctly. The probability that more than 5,000 people will be right is about 98%. [1]

The key assumption here is **independence**, which is not realistic in most cases. But as people move in the direction of independence, the quality of the majority vote improves. Another assumption is that people are what machine learning calls “weak learners,” i.e. that they perform slightly better than chance. This holds more often than independence, but on some subjects people tend to do worse than chance, particularly experts.

You could call this the wisdom of crowds, but it’s closer to the wisdom of markets. As James Surowiecki points out in his book The Wisdom of Crowds, crowds (as in mobs) aren’t wise; large groups of independent decision makers are wise. Markets are wiser than crowds because they aggregate more independent opinions. Markets are subject to group-think as well, but not to the same extent as mobs.

* * *

[1] Suppose there are *N* people, each with independent probability *p* of being correct. Suppose *N* is large and *p* is near 1/2. Then the probability of a majority answering correctly is approximately

*Prob*( *Z* > (1 – 2*p*) sqrt(*N*) )

where *Z* is a standard normal random variable. You could calculate this in Python by

from scipy.stats import norm from math import sqrt print( norm.sf( (1 - 2*p)*sqrt(N) ) )

This post is an elaboration of something I first posted on Google+.

Of course, individually-rational Bayesian updating on the count-so-far destroys independence, and can lead to information cascades.

This fits with a theme of Taleb’s, that failures at the individual level lead to robustness at the societal level, and vice versa.

I thought that the point was that markets are better than crowds because “markets” consist of actors that are taking actions (buying/selling) — and therefore having to live with their decisions — rather than a crowd that may want to voice an opinion, but may not have any expectation of responsibility for the opinion.

Richard: You are right, that is the usual reason given for the good performance of markets. I’m pointing out another reason, that markets are somewhat independent. Your actions in a market are dependent on the actions of others, but in a more deliberative way than, say, people fleeing a building.

Related to this is the discussion about the quality of your taskforce.

see http://www.behind-the-enemy-lines.com/2011/11/does-lack-of-reputation-help.html for more details

Of course, if people have only a 49% chance of answering correctly, assuming independence, the probability of at least 5,000 getting it correct is only 2%, so that little bit on either side of 50% makes all the difference!

So, we can do some boosting and have more people who looked preferentially at the cases we got wrong :)

Great article!

I have been aware of the independence principle behind the “Wisdom of Crowds” for some time. Long before Surowiecki’s book, there was Galton’s example of a crowd at a fair guessing the weight of an ox (mentioned in the book). I had read about this example many years ago, but don’t recall the source.

“The opening anecdote relates Francis Galton’s surprise that the crowd at a county fair accurately guessed the weight of an ox when their individual guesses were averaged (the average was closer to the ox’s true butchered weight than the estimates of most crowd members, and also closer than any of the separate estimates made by cattle experts).”

http://en.wikipedia.org/wiki/The_Wisdom_of_Crowds

Further down in the same link, there’s a useful summary of the conditions for the crowd to make accurate judgments:

http://en.wikipedia.org/wiki/The_Wisdom_of_Crowds#Four_elements_required_to_form_a_wise_crowd

So, here’s my question: Given that some level of independence is necessary for the wisdom of crowds to operate, has anyone tried to quantify how the departure from independence affects the accuracy. My guess is that the accuracy degrades according to some function of the independence, and that there’s still some accuracy present when there are only small departures from independence. Of course, we would need some metric of independence in order to try to answer this question.

Andy Gelman just posted a great article about another way the wisdom of crowds can fail:

http://andrewgelman.com/2014/03/20/candy-weighing-demonstration-unwisdom-crowds/

Independence is a strong assumption. I require evidence before I assume independence.

The wisdom of crowds is very frequently the pooled ignorance of crowds.

@fred There was a study that explores this exact question by looking at the impact of just seeing rankings of others in artificial music markets. see: http://www.princeton.edu/~mjs3/salganik_dodds_watts06_full.pdf. Result is that exposing these rankings to the market introduces all sorts of random artifacts in the rank order of the popularity of the songs.

There is a nice exapmple with only 5 poeople.

(from J. SzĂ©kely: Paradoxes in probability theory and mathematical statistics,)

A,B,C,D and E are members of a jury, and the decesion is made by the majority of the votes. They make the right decesion with probabilities 95%,9%,90%,90% and 80% independently from each other.

So the chance that the jury is wrong is only 0.7%.

Let suppose that E, the weakest, gives up his opinion, and always follows what A is saying. The chance to be wrong would increase to 1.15% !

This exapmles also suggests how important independency is.

@Imre Koncz: Your observation is clearly correct, but I can’t duplicate your numbers…

Using a simple enumeration of the cases, I come up with .99% for the independent case and 1.45% for the case where E follows A. (Assuming B is supposed to be 90% not 9%). Have I goofed up my math somewhere?

@IJ

Sorry, the example i wanted to show is 95% 95% 90% 90% 80%, sorry for the typo.

i have also re-calculated it. So for the original I got 0.7055% and 1.2000% (instead of 1.15%). You should came to the same result…as with 95% 90% 90% 90% 80% used by you, i got the same numbers as you had.

Sorry again.

At least this shows that it works with other numbers as well :)

“Given that some level of independence is necessary for the wisdom of crowds to operate, has anyone tried to quantify how the departure from independence affects the accuracy. ”

I have been thinking on your question for days…

I have only one idea so far:

using Gaussian Copula for uniform variables U_1,U_2,…,U_N, and defining events X_i = i-th guy made a right decision as

X_i := U_i<0.52.

The copulas rho parameter is a kind of measure of independence. e.g. rho=0 is independcy, rho=1 everybody vote the same. (so prob the crowd fail is 48%)

One can also calculate other measures from rho easily, e.g.

P(Xi and Xj) – P(Xi)*P(X_j), or P(X_i | X_j)-P(X_i).

Im gonna try it. If anyone else would try it, we could compare the results….

Any other idea?

Thanks for all the interesting responses so far on my question concerning the quantification of the effects of departure from independence.

One of the things that’s always on my mind when using models is not so much one of whether the use of the model in some situation violates one or more of its assumptions, but:

How robust is the model against the violation of the assumption(s)?

Hi All,

so i made a Gaussian copula experience, i mentioned above.

Here is some result:

Rho P(A_i|A_j) P(good decision)

0.00 51.00% 97.7%

0.05 51.16% 63.6%

0.01 51.31% 59.8%

0.10 54.13% 53.2%

0.20 57.28% 52.3%

0.50 67.34% 51.5%

0.95 90.06% 51.09%

1.00 100% 51.00%

We can see that in-dependency is really a very strong assumption, and a very small level of dependency (when conditional probability of a YES by learner i conditioned by Yes by learner j is 51.16%) break down sharply the good group decision to 37% from 98%.

IMO, perfect dependency between learners is almost irrational assumption.

Avraham Adler says:

“Of course, if people have only a 49% chance of answering correctly, assuming independence, the probability of at least 5,000 getting it correct is only 2%, so that little bit on either side of 50% makes all the difference!”

Avraham, your point seems to be fair if we naively put 49% to the formula of probabilities. However, in practice, i would say, that 49% learner is not worse than 51% (and much better than 50%), as you can invert the output of the 49% accuracy learner, and get a 51% one!

With other words, the worst learner you can make is the 50% accuracy classification, which is the random decision case, and can be replicated by a coin toss . This learner contains the less amount of information (zero). A 49% learner has the same amount of information as the 51%. (and you can exploit it just inverting the output.)

Imagine a classifier which is 1% accurate (e.g an expert who is almost always wrong about the next day market direction). Just invert it, then you got a 99% classifier (so you will make a fortune if it predicts stock price movements :))

This problem’s been very widely studied in the epidemiology literature where the analogue of the crowd is a set of diagnostic tests such as X-rays, physical exams, blood samples, each of which comes with not only an accuracy but typically a sensitivity and specificity. The models typically contain either random effects (such as a difficulty for classification or some latent size for classifying tumors) or fixed effects (correlation among crowd responses explicitly modeled). You also see a lot of chi-squared tests on marginals, such as predictived positive and negative cases, to try to reject the applicability of the model. When there are correlations in responses, it’s easy to reject the model that assumes independence.

People have recently applied these models to data annotation for machine learning, where you typically train and test your classifier (or whatever) on some “gold standard” data that’s assembled by having one or more human annotator label each item in the training or test sets. Massimo Poesio and I gave a tutorial at LREC a few years ago on modeling the annotation process: http://lingpipe.files.wordpress.com/2008/04/malta-2010-slides.pdf and there have been numerous papers in machine learning and natural language processing conferences (and presumably elsewhere).

I’ve read that book a few years ago. The author states a few necessary conditions for the “wisdom of the crowds” to appear. If just one of them is not met, it easily devolves into mob stupidity.

The elephant in the room completely unmentioned by the author (but pretty obvious if you think about it) is that in modern democratic elections NONE of those conditions is anywhere close to being met!