# Point estimation - German tank problem

The purpose of this activity is to investigate a real-life problem requiring point estimation and to gain experience identifying "good" estimators (using the concepts of bias and variability).

### Introduction

As written in a 1947 research article: "In early 1943 the Economic Warfare Division of the American Embassy in London started to analyze markings and serial numbers obtained from captured German equipment in order to obtain estimates of German war production and strength. ... The first product to be so analyzed was tires, and after this tanks, trucks, guns, flying bombs, and rockets were studied. Aircraft markings were not studied by the Economic Warfare Division, since, by previous agreement, the British Air Ministry bore the responsibility for all estimates on aircraft production. The uses of the intelligence derived from the markings were varied. At times it helped decide the target systems of the air forces; on other occasions it gave indications of German strength in weapons such as tanks and rockets. After the war official statistics on German war production became available, so that it is now possible to evaluate the accuracy of the estimates which were made. Part II presents a summary scatter diagram of the estimates and official data along with a more detailed treatment of certain estimates."[1]

The markings used in the analysis of tanks were serial numbers from captured equipment. These serial numbers provided a sample that was very small, but reliable. The statisticians made one assumption: that the Germans had logically numbered their tanks in the order in which they were produced. (This deduction turned out to be right.) It was enough to enable the statisticians to make an estimate of the total number of tanks that had been produced up to any given moment.  The records of the Speer Ministry, which was in charge of Germany's war production, were recovered after the war.  Special studies made after the war discovered that the British and U.S. estimates of German production were more accurate and timely than Germany's own estimates.

### Student activity

Estimate for completion time: 60 minutes.

Materials needed:

• Numbers 1 to N (representing the current population of German tanks) on slips of paper in a container

# Student Activity

 The German Tank Problem, Part 1 Challenge: Develop and implement an estimator to determine the total number of tanks. Form Allied Statistician Units of size 2-4. From the population of German tanks (slips of paper in a container) capture 5 German tanks (without replacement AND WITHOUT LOOKING IN THE CONTAINER).  Record the numbers, and return them to the cup.  Share with other groups as needed, until everyone has captured 5 tanks. Using your 5 serial numbers, develop an estimate of the total number of German tanks. Some common suggestions: Double the mean of the 5 numbers. Double the median of the 5 numbers. Add the average gap between the 5 numbers to the largest number. Add 3 standard deviations to the mean. Add the smallest and largest numbers. Triple the mean or median of the 5 numbers. Use the largest of the three numbers. Calculate the outlier threshold (Q3+1.5IQR) Write down your formula for estimating the number of tanks (number of slips in the container). Calculate an estimate of the total number of tanks based on your 5 captured tanks and for each of the other groups' captured tanks. Discuss and come to consensus on why you have chosen this estimator.

Give the groups 10 minutes or so to complete the above.

As a full class:

• Have each group describe their estimator and estimate, and tell why they chose that particular method.
• Discuss: How do we decide the "usefulness" of an estimator. Do the concepts of bias and variability play a role?
• Discuss: How can we evaluate an estimator for bias and variability?

# Student Activity

 The German Tank Problem, Part 2 Challenge: Evaluate your estimator for the total number of tanks for bias and variability. For evaluation purposes, assume that the actual population of tanks is 122. Using a spreadsheet program simulate a large number of n=5 samples out of a population of 122. Calculate your estimate for the total number of tanks. Evaluate your estimate for bias and variability. Compare the evaluation of your estimate with the results from other groups.

### Historical results

The following is adapted from the Wikipedia article German tank problem.[2]

The method used by Allied statisticians to estimate the total number of tanks produced was: Add the average gap between the observations in the sample to the sample maximum.

Using an unadjusted sample maximum as an estimator for the population maximum produces a negative bias. (The sample maximum is never more than the population maximum, but can be less, hence it is a biased estimator: it will tend to underestimate the population maximum.) Adding the average gap tends to compensate for the negative bias. The formula for this is written:

$N = \frac{k+1}{k} m - 1 = m + \frac{m}{k} - 1.$

where N is the true population size (total number of tanks), m is the largest serial number observed (sample maximum) and k is the number of tanks observed (sample size).

To understand this equation, imagine that the samples are evenly spaced throughout the range. The average gap between samples is m/k − 1; the −1 included so as not to count the samples themselves in computing the gap between samples. For example, if N=25, m=20, and k=5, the average gap is calculated as 20/5 - 1 = 3. This seems right because the gap between 16 and 20 is 3: 17, 18, and 19. Added to the observed maximum (m), the estimate for total number (N) is 23.

The Allied conventional intelligence estimates believed the number of tanks the Germans were producing between June 1940 and September 1942 was around 1,400 a month. Using the above formula on the serial numbers of captured German tanks, (both serviceable and destroyed) the number was calculated to be 256 a month. After the war recovered German production figures from the Speer Ministry show the actual number to be 255.

Estimates for some specific months are given as:

 Month Statisticalestimate Intelligenceestimate Germanrecords June 1940 169 1000 122 June 1941 244 1550 271 August 1942 327 1550 342

### Comparing estimators

The best estimator will be unbiased and will minimize the variance of the sampling distribution. How do the different methods compare under simulated conditions (1000 samples of n=5 from a population with N=342)? Evaluate the following histograms for bias and variability.

### Other applications

Ask the students if during this activity they thought of any other situations where this estimator might be used.

The following is adapted from the Wikipedia article, German tank problem[3]:

In 2008 A London investor, called Tommo_UK, was asking for people to post the serial numbers of their iphone and the date they bought it to a website. He said he wanted to determine the number of iphones Apple had sold so far; Apple began distributing the iphone in the US on June 29, 2007. From this information and using the above formula he was able to calculate that Apple Inc had sold 9,190,680 iPhones to the end of September 2008. which meant that they would probably sell more than 10 million in the year.

### Resources

The following resources were used for ideas and organization in the development of this activity:

### References

1. Ruggles; Brodie (1947). "An empirical approach to economic intelligence in WWII", Journal of the American Statistical Association, 42 (237): 72–91.
2. Wikipedia, German tank problem sections: Exposition:Intution, and Historical problem:Specific data, accessed 2 Mar 2010.
3. Wikipedia, German tank problem section: Historical problem:Application to iPhone production estimation, accessed 4 Mar 2010.