Interesting perspective on information theory:
To me, the subject of “information theory” is badly named. That discipline is devoted to finding ideal compression schemes for messages to be sent quickly and accurately across a noisy channel. It deliberately does not pay any attention to what the messages mean. To my mind this should be called compression theory or redundancy theory. Information is inherently meaningful—that is its purpose—any theory that is unconcerned with the meaning is not really studying information per se. The people who decide on speed limits for roads and highways may care about human health, but a study limited to deciding ideal speed limits should not be called “human health theory”.
Despite what was said above, Information theory has been extremely important in a diverse array of fields, including computer science but also in neuroscience and physics. I’m not trying to denigrate the field; I am only frustrated with its name.
From David Spivak, footnotes 13 and 14 here.
Information Theory is used to transmit and preserve information. If you want to study meaning, wouldn’t Semantic Theory be a better term?
I was going to recommend Spivak’s book on category theory to you, but I’ve see you’ve already found it. What do you think so far?
I agree with QM above. Messages consist of information and redundancy; it seems odd to feel that calling the study of which is which could properly be called “redundancy theory”, but not “information theory”. The two are perfectly complementary.
“Information is inherently meaningful” is an interesting sentence, from a philosophy-of-language point of view. It’s either a boring tautology, or false, depending on which school you belong to. Raw sensations (‘qualia’) are the obvious problem case; most people would agree that they are ‘information’, but not necessarily ‘meaningful’.
I get the feeling that this happens in every field, especially as meanings change over time. See, e.g., linear programming.
Here’s something that has always bothered me about information theory. Suppose I have an incompressible file. Is it incompressible because it is information-rich, all redundancy having been squeezed out, or is it random noise? It’s almost as if the definition of information wraps around itself.
Off the bat, he seems to be somewhat confused about source coding vs. channel coding. He is also getting hung up on everyday language vs a mathematical definition. Overall, it makes his comments on
this subject hard to take too seriously.
Perhaps the terms “communication theory” or “coding theory” would make him happier.
@john: Both. Random noise is information-rich. It’s probably easiest to understand that if you think in terms of Kolmogorov complexity. If you wanted to reproduce a random sequence exactly
then the shortest description of it is just the sequence itself. The same would also hold for any meaningful data where all the redundancy has been squeezed out.
That’s where information theory parts company with the common understanding of information. A common sense understanding of randomness is that it is devoid of information. Maybe this is something Spivak had in mind.
It’s interesting that a pseudo-random sequence has tiny Kolmogorov complexity, while a truly random sequence has a large Kolmogorov complexity. And yet it may be statistically impossible to distinguish the two with knowing how they were produced. The two sequences are opposites in theory but indistinguishable in practice.
A random sequence of numbers is not devoid of information even in the colloquial use of the word. That the random sequence is information rich is the conclusion one reaches–inescapably, in my view–when one thinks very deeply about our everyday use of the word information. If one wishes to be careful, mathematical, rigorous about what one means by information, then one gets a notion of information similar (or identical to) that found in information theory. The paragraph of text is more “meaningful” to you precisely because it encodes a much smaller and therefore more manageable amount of information.
You may feel that the paragraph of text is more information rich than the random string because the paragraph may have more semantic connections to your other knowledge, but those semantic connections are not information content. They reside at least partially outside of the text and differ from brain to brain.
I take his point, but there really shouldn’t be any confusion on it as even in Claude Shannon’s original paper establishing the field of information theory clearly states right up front (in it’s second paragraph no less) that: “Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem.”
The broader problem most of the non-specialists have is with the word information itself which has such a broad and nebulous definition now in contrast with that of 1948 when Shannon published his original paper (viewed here for those iterested: http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf) James Gleick does a reasonable job covering the multiple uses of the word in his most recent book The Information.
Those interested in the semantics portion may be better of reading N. Chomsky.
Now if someone wants to have a discussion on the poorly and multiply defined concept of entropy, that’s a matter where there may be some real meat…
Random noise is information-rich, but it’s information you usually don’t care about. (But you might care a lot if it turns out to be the crypto key of an adversary, or something.)
That’s *the* difference between what we usually think of as “noise” and what we usually think of as “information”, and it’s a difference that (at least until we understand a lot more about people and their minds than we now do) the mathematics doesn’t and shouldn’t care about.
(This is essentially the same thing Robert and Chris said above, but a different perspective may be helpful.)
If you consider noise information, and I could see how from a certain perspective it is, that makes things tidy but it avoids some important issues. You could say that the noisy channel problem is a non-issue: if you’re not interested in some of the information coming down the channel, information you call noise, that’s your problem.
No, saying that noise is information doesn’t make the noisy channel problem a non-issue. What happens with a noisy channel is that it loses information: the information you’re interested in gets mixed up with the information you aren’t in a non-information-preserving way.
John wrote: “Is it incompressible because it is information-rich, all redundancy having been squeezed out, or is it random noise?”
You just “compressed” any amount of random noise between you and your readers with two words „random noise”.
I think of it in analogy to “small” infinity and “big” infinity. Zipf law tells also that „information” is a compromise or trade-off between reader that wants all explained in great detail (close to big infinity, or random noise) and the writer that wants all information in one small incompressible word. … kind of space/time, eval/apply, cross validation/regularization … one’s point of view optimum is another’s point of view random noise.
The definition of information wraps around itself like the definition of infinity does.
Hi John,
I agree with you completely.
The tools from information theory only begin to describe a stochastic process. Entropy rate is a blunt tool. And Kolmogorov complexity can hardly be called a tool at all (since its incomputable in general).
A simple example illustrates the weakness of entropy rate. Suppose we generate a string using a two state Markov model with no self-transitions, and label the states 0 and 1. Then the Markov chain will always produce
…01010101010101…
If we blindly compute the (block-1) entropy of this sequence, we get 1 bit. But the entropy rate is obviously 0: no entropy is produced by this sequence since it’s completely deterministic. As pointed out by others, it also has a lower Kolmogorov complexity than, say, a random sequence. And yet we would like to think of this sequence has having *more*, not *less*, structure than a random sequence.
Jim Crutchfield, a physicist at UC Davis, has proposed an orthogonal, complementary view of (stationary, ergodic) stochastic processes that better captures our intuitions about ‘information.’ He calls the approach computational mechanics, as in ‘computation-theoretic mechanics.’ A nice, quick review article that motivates his approach can be found here:
http://csc.ucdavis.edu/~chaos/papers/Crutchfield.NaturePhysics2012.pdf
The two-course sequence that Crutchfield has taught on the subject can be found here:
http://csc.ucdavis.edu/~chaos/courses/ncaso/
Computational mechanics is an approach that deserves more attention in the information theory, stochastic processes, and statistics literatures, and yet it’s largely unknown outside of physics (with a few notable exceptions).
The distinction between the random noise and the maximally compressed stream is that you have a system of decoding the compressed stream. Equating noise to a compressed message ignores the fact that information theory explicitly includes context and a model of encoding/decoding. If you’re looking for meaning, it’s present in the combination of the signal and the model your system uses. This is partially what we’re doing in linguistics when applying information theory to a problem.