During the Second World War, estimates of the number of tanks deployed by Germany were critically needed. The Allies adopted a successful statistical approach to estimate this information: assuming that the tanks are sequentially numbered starting from 1, if we observe $k$ tanks from an unknown total of $N$, then the best linear unbiased estimator for $N$ is $M(1+1/k)-1$ where $M$ is the maximum observed serial number. However, in many situations, the original German Tank Problem is insufficient, since typically there are $l>1$ factories, and tanks produced by different factories may have serial numbers in disjoint ranges that are often far separated. Clark, Gonye and Miller presented an unbiased estimator for $N$ when the minimum serial number is unknown. Provided one identifies which samples correspond to which factory, one can then estimate each factory's range and summing the sizes of these ranges yields an estimate for the rival's total productivity. We construct an efficient procedure to estimate the total productivity and prove that it is effective when $\log l/\log k$ is sufficiently small. In the final section, we show that given information about the gaps, we can make an estimator that performs orders of magnitude better when we have a small number of samples.
翻译:第二次世界大战期间,准确估算德国部署的坦克数量至关重要。盟军采用了一种成功的统计方法来估计该信息:假设坦克序列号从1开始连续编号,若从未知总数$N$中观测到$k$辆坦克,则$N$的最佳线性无偏估计量为$M(1+1/k)-1$,其中$M$为观测到的最大序列号。然而在许多实际情境中,经典德国坦克问题存在局限性,因为通常存在$l>1$个生产工厂,且不同工厂生产的坦克可能具有互不相交且间隔较大的序列号范围。Clark、Gonye与Miller提出了一种在最小序列号未知时对$N$的无偏估计方法。若能识别样本对应的工厂归属,则可分别估计各工厂的生产范围,将这些范围规模相加即可得到对手总产能的估计值。我们构建了一种高效的总产能估计流程,并证明当$\log l/\log k$足够小时该方法是有效的。在最后章节中,我们证明通过利用序列号间隔信息,可在样本量较少时构建出性能提升数个数量级的估计量。