Remove noise, remove signal

Whenever you remove noise, you also remove at least some signal. Ideally you can remove a large portion of the noise and a small portion of the signal, but there’s always a trade-off between the two. Averaging things makes them more average.

Statistics has the related idea of bias-variance trade-off. An unfiltered signal has low bias but high variance. Filtering reduces the variance but introduces bias.

If you have a crackly recording, you want to remove the crackling and leave the music. If you do it well, you can remove most of the crackling effect and reveal the music, but the music signal will be slightly diminished. If you filter too aggressively, you’ll get rid of more noise, but create a dull version of the music. In the extreme, you get a single hum that’s the average of the entire recording.

This is a metaphor for life. If you only value your own opinion, you’re an idiot in the oldest sense of the word, someone in his or her own world. Your work may have a strong signal, but it also has a lot of noise. Getting even one outside opinion greatly cuts down on the noise. But it also cuts down on the signal to some extent. If you get too many opinions, the noise may be gone and the signal with it. Trying to please too many people leads to work that is offensively bland.

Related post: The cult of average

6 thoughts on “Remove noise, remove signal

  1. Yeah, averaging things makes them more average. But maybe it’s not averaging as much as it is optimization.

    Like, I’m kind of partial to the muse end of the spectrum, but a little bit of collaboration and feedback goes a long way. The first data point is the most important; the tenth or 20th doesn’t give you much. One external viewpoint can improve the output immensely.

    Like a good editor (not that you need one, John – you’re a great writer). Too much of the crowd, though, and you’re trying to create by committee. Pulled this way and that by the cacophonous crowd. Now that’s insanity.

  2. Hi John,

    Suppose I have some sequence of symbols which is just noise – ie. it tells you almost nothing about that which it is supposed to represent. For example, the short URL doesn’t tell you where you are actually going:

    http://goo.gl/87bwEw

    Now also suppose that this sequence is actually the result of a lossless compression algorithm and by decompressing it you recover the original meaning completely:

    http://www.johndcook.com/blog/2013/10/28/remove-noise-remove-signal/

    Isn’t this like removing noise while preserving all the signal?

  3. John:

    The connection to bias-variance doesn’t seem so clear to me. In a hierarchical model, you get better predictions that make more sense if you do the partially pooled estimate. But this does not involve any removal of signal.

    I guess what I’m saying is: the signal-noise tradeoff may be an example of the bias-variance tradeoff. But there are examples where a better (“biased”) estimate does not discard signal. Rather, it is by adding signal (in the form of a model, or more simply from prior information) that one moves to a better, “biased,” estimate. The problem, I think, is that the statistical term “bias” is way too specific, in particular depending on an arbitrary division of information into “data” and “prior information” and an arbitrary division of unknowns into “parameters” and “predictions.”

Comments are closed.