Big data

What’s hard about big data? As Bradley Efron explains:

In some ways I think that scientists have misled themselves into thinking that if you collect enormous amounts of data you are bound to get the right answer. You are not bound to get the right answer unless you are enormously smart. You can narrow down your questions; but enormous data sets often consist of enormous numbers of small sets of data, none of which by themselves are enough to solve the thing you are interested in, and they fit together in some complicated way.

Emphasis added.

An enormous amount of simple data is a blessing, not a curse. If you have too much simple data, you can take a manageable sample of it and ignore the rest!

What makes big data hard is that the data are not simple, and they only contain indirect information about what you want to know. It takes hard work to extract useful signals from the noise, but the effort can be very worthwhile.

