Sometimes you have a poor quality source of randomness and you want to refine it into a better source. You might, for example, want to generate cryptographic keys from a hardware source that is more likely to produce 1’s than 0’s. Or maybe your source of bits is dependent, more likely to produce a 1 when it has just produced a 1.
Von Neumann’s method
If the bits are biased but independent, a simple algorithm by John von Neumann will produce unbiased bits. Assume each bit of input is a 1 with probability p and a 0 otherwise. As long as 0 < p < 1 and the bits are independent, it doesn’t matter what p is .
Von Neumann’s algorithm works on consecutive pairs bits. If the pair is 00 or 11, don’t output anything. When you see 10 output a 1, and when you see 01 output a 0. This will output a 0 or a 1 with equal probability.
Manuel Blum’s method
Manuel Blum built on von Neumann’s algorithm to extract independent bits from dependent sources. So von Neumann’s algorithm can remove bias, but not dependence. Blum’s algorithm can remove bias and dependence, if the dependence takes the form of a Markov chain.
Blum’s assumes each state in the Markov chain can be exited two ways, labeled 0 and 1. You could focus on a single state and use von Neumann’s algorithm, returning 1 when the state exits by the 1 path the first time and 0 the second time, and return 0 when it does the opposite. But if there are many states, this method is inefficient since it produces nothing when the chain is in all the other states.
Blum basically applies the same method to all states, not just one state. But there’s a catch! His paper shows that the naive implementation of this idea is wrong. He gives a pseudo-theorem and a pseudo-proof that the naive algorithm is correct before showing that it is not. Then he shows that the correct algorithm really is correct.
This is excellent exposition. It would have been easier to say nothing of the naive algorithm, or simply mention in passing that the most obvious approach is flawed. But instead he went to the effort of fully developing the wrong algorithm (with a warning up-front that it will be shown to be wrong).
 It matters for efficiency. The probability that a pair of input bits will produce an output bit is 2p(1 – p), so the probability is maximized when p = 0.5. If p = 0.999, for example, the method will still work correctly, but you’ll have to generate 500 pairs of input on average to produce each output bit.