Point estimation - German tank problem

The purpose of this activity is to investigate a real-life problem requiring point estimation and to gain experience identifying "good" estimators (using the concepts of bias and variability).

Introduction
As written in a 1947 research article: "In early 1943 the Economic Warfare Division of the American Embassy in London started to analyze markings and serial numbers obtained from captured German equipment in order to obtain estimates of German war production and strength. ... The first product to be so analyzed was tires, and after this tanks, trucks, guns, flying bombs, and rockets were studied. Aircraft markings were not studied by the Economic Warfare Division, since, by previous agreement, the British Air Ministry bore the responsibility for all estimates on aircraft production. The uses of the intelligence derived from the markings were varied. At times it helped decide the target systems of the air forces; on other occasions it gave indications of German strength in weapons such as tanks and rockets. After the war official statistics on German war production became available, so that it is now possible to evaluate the accuracy of the estimates which were made. Part II presents a summary scatter diagram of the estimates and official data along with a more detailed treatment of certain estimates."

The markings used in the analysis of tanks were serial numbers from captured equipment. These serial numbers provided a sample that was very small, but reliable. The statisticians made one assumption: that the Germans had logically numbered their tanks in the order in which they were produced. (This deduction turned out to be right.) It was enough to enable the statisticians to make an estimate of the total number of tanks that had been produced up to any given moment. The records of the Speer Ministry, which was in charge of Germany's war production, were recovered after the war. Special studies made after the war discovered that the British and U.S. estimates of German production were more accurate and timely than Germany's own estimates.

Student activity
Estimate for completion time: 60 minutes.

Materials needed:
 * Numbers 1 to N (representing the current population of German tanks) on slips of paper in a container
 * Spreadsheet program

Give the groups 10 minutes or so to complete the above.

As a full class:
 * Have each group describe their estimator and estimate, and tell why they chose that particular method.
 * Discuss: How do we decide the "usefulness" of an estimator. Do the concepts of bias and variability play a role?
 * Discuss: How can we evaluate an estimator for bias and variability?

Historical results
The following is adapted from the Wikipedia article German tank problem.

The method used by Allied statisticians to estimate the total number of tanks produced was: Add the average gap between the observations in the sample to the sample maximum.

Using an unadjusted sample maximum as an estimator for the population maximum produces a negative bias. (The sample maximum is never more than the population maximum, but can be less, hence it is a biased estimator: it will tend to underestimate the population maximum.) Adding the average gap tends to compensate for the negative bias. The formula for this is written:


 * $$N = \frac{k+1}{k} m - 1 = m + \frac{m}{k} - 1.$$

where N is the true population size (total number of tanks), m is the largest serial number observed (sample maximum) and k is the number of tanks observed (sample size).

To understand this equation, imagine that the samples are evenly spaced throughout the range. The average gap between samples is m/k &minus; 1; the &minus;1 included so as not to count the samples themselves in computing the gap between samples. For example, if N=25, m=20, and k=5, the average gap is calculated as 20/5 - 1 = 3. This seems right because the gap between 16 and 20 is 3: 17, 18, and 19. Added to the observed maximum (m), the estimate for total number (N) is 23.

The Allied conventional intelligence estimates believed the number of tanks the Germans were producing between June 1940 and September 1942 was around 1,400 a month. Using the above formula on the serial numbers of captured German tanks, (both serviceable and destroyed) the number was calculated to be 256 a month. After the war recovered German production figures from the Speer Ministry show the actual number to be 255.

Estimates for some specific months are given as:

Comparing estimators
The best estimator will be unbiased and will minimize the variance of the sampling distribution. How do the different methods compare under simulated conditions (1000 samples of n=5 from a population with N=342)? Evaluate the following histograms for bias and variability.

Other applications
Ask the students if during this activity they thought of any other situations where this estimator might be used.

The following is adapted from the Wikipedia article, German tank problem :

In 2008 A London investor, called Tommo_UK, was asking for people to post the serial numbers of their iphone and the date they bought it to a website. He said he wanted to determine the number of iphones Apple had sold so far; Apple began distributing the iphone in the US on June 29, 2007. From this information and using the above formula he was able to calculate that Apple Inc had sold 9,190,680 iPhones to the end of September 2008. which meant that they would probably sell more than 10 million in the year.

Resources
The following resources were used for ideas and organization in the development of this activity:


 * Molesky, Jason, "The German Tank Problem", see resources under Chapter 9
 * Evans, Diane, The German Tank Problem
 * Smart, Joyce, "German Tank Problem