Conditional independence notation

Ten years ago I wrote a blog post that concludes with this observation:

The ideas of being relatively prime, independent, and perpendicular are all related, and so it makes sense to use a common symbol to denote each.

This post returns to that theme, particularly looking at independence of random variables.

History

Graham, Knuth, and Patashnik proposed using ⊥ for relatively prime numbers in their book Concrete Mathematics, at least by the second edition (1994). Maybe it was in their first edition (1988), but I don’t have that edition.

Philip Dawid proposed a similar symbol ⫫ for (conditionally) independent random variables in 1979 [1].

As explained here, independent random variables really are orthogonal in some sense, so it’s a good notation.

Typography

The symbol ⫫ (Unicode 2AEB, DOUBLE TACK UP) may or may not show up in your browser; it’s an uncommon character and your font may not have a glyph for it.

There’s no command in basic LaTeX for the symbol. You can enter the Unicode character in XeTeX, and there are several other alternatives discussed here. A simple work-around is to use

    \perp\!\!\!\perp

This says to take two perpendicular symbols, and kern them together by inserting three negative spaces between them.

The package MsSymbol has a command \upmodels to produce ⫫. Why “upmodels”? Because it is a 90° counterclockwise rotation of the \models symbol ⊧ from logic.

To put a strike through ⫫ in LaTeX to denote dependence, you can use \nupmodels from the MsSymbol package or if you’re not using a package you could use the following.

    \not\!\perp\!\!\!\perp

Graphoid axioms

As an example of where you might see the ⫫ symbol used for conditional independence, the table below gives the graphoid axioms for conditional independence. (They’re theorems, not axioms, but they’re called axioms because you could think of them as axioms for working with conditional independence at a higher level of abstraction.)

Note that the independence symbol ⫫ has higher precedence than the conditional symbol |. That is, X ⫫ Y | Z means X is independent of Y, once you condition on Z.

The axioms above are awfully dense, but they make sense when expanded into words. For example, the symmetry axiom says that if knowledge of Z makes Y irrelevant to X, it also makes X irrelevant to Y. The decomposition axiom says that if knowing Z makes the combination of Y and W irrelevant to X, then knowing Z makes Y alone irrelevant to X.

The intersection axiom requires strictly positive probability distributions, i.e. you can’t have events with probability zero.

More on conditional probability

[1] AP Dawid. Conditional Independence in Statistical Theory. Journal of the Royal Statistical Society. Series B (Methodological), Vol. 41, No. 1 (1979), pp. 1-31

6 thoughts on “Conditional independence notation”

1. GlennF

\perp\!\!\!\perp looks a bit wide to me, \perp\!\!\!\!\!\>\perp might be a better workaround for some ( 5 negative thin space + 1 positive medium space)

2. Alessandro Gentilini

Hello John,
I own a “Fourth printing, with corrections, January 1990” First Edition of Concrete Mathematics and it uses \perp: “[m \per n] 1 if m is relative prime to n, otherwise 0” (in “A note on Notation” at page xi).

3. John:

Why the double perp? Why not just use regular perpendicular notation? That’s what we use in our book.

4. @Andrew: Maybe Dawid was thinking that he wanted a symbol reminiscent of perpendicular but didn’t want to overload that symbol on the grounds that different things should look different. I occasionally sympathize with that position, but I more often lean toward liberal overloading of common symbols. I’d prefer the ordinary perpendicular symbol because there’s no danger of confusion.

5. It’s also very similar to some of the code page 437 box drawing characters, namely C1 and D0. If you’ve spent much time writing DOS text-mode programs this should be familiar.

6. Michael Turmon

@Andrew: using double-perp for independence, and reserving the single perp for L2 orthogonality (uncorrelatedness) can be a wise choice. Doing this can help clarity in some domains, like time domain filtering, where you switch back and forth between different sets of assumptions when considering different approaches.