During World War II, America and her allies needed to estimate the number of Panzer V tanks Germany had produced. The solution was simple: Look at the serial numbers of the captured tanks. If you assume the tanks had been sequentially numbered — as in fact they were — you could view the serial numbers of the captured tanks as random samples from the entire range. You could then use statistics to estimate the range and hence the number of tanks produced. More details available here.

A few years later America tried to use the serial number trick to estimate the number of Soviet strategic bombers. This time the trick backfired.

In 1958, American military intelligence believed the USSR would soon have four hundred Bison and three hundred Bear bombers capable of striking the American heartland. Their evidence was the high serial number of a Bison that had flown at a May Day parade in Moscow. In fact, the Soviets knew the Americans were watching, and intentionally inflated that number. — Rocket Men, page 118.

The Panzer estimate was accurate because the Allies had hundreds of data points, enough to support the assumption that the tanks were sequentially numbered and to make a good estimate of the total number.

The Bison bomber was only one data point, but it was consistent with what intelligence services (wrongly) believed. At that time, the US had grossly over-estimated the military capabilities of the USSR. According to *Rocket Men*, Khrushchev turned down US offers to cooperate in space exploration because he feared that such cooperation would give the US a more accurate assessment of his country’s military.

**Related post**: Selection bias and bombers

Fascinating, I love bits of history like this.

Hi John,

why does the formula (M-1)(S+1)/S work?

S is the tank sample size and M is the maximum serial number.

Thank you,

Alessandro

Alessandro: I don’t know what criteria were used to derive that estimate, but it makes sense. The M-1 simple makes the serial numbers based at 0.

If you only have one tank, you assume it’s in the middle of the range. Of course it could be anywhere in the range, but by symmetry it seems that the middle is as good a guess as any other.

As S gets larger, (S+1)/S is essentially1, and so for a large sample, your estimate almost just the largest serial number, which would be the maximum likelihood estimate.

I think the MLE is biased and (S+1)/S is just a bias adjustment. For a uniform [0, b] variable, the maximum of S observations has a pdf of f(x) = S * b^(-S) * x^(S-1). The expectation is then just (S / (S+1)) * b. I believe this estimate is the MVUE.

c: Thanks. That sounds right.

As the topic fits so well: The National Socialist Party started counting their members with 501 in order to pretend to be bigger. Hitler was the nominal no. 555.

http://en.wikipedia.org/wiki/German_tank_problem