Multiple systems estimation uses samples that each cover part of a population to obtain a total population size estimate. Ideally, all the available samples are used, but if some samples are available (much) later, one may use only the samples that are available early. Under some regularity conditions, including sample independence, two samples is enough to obtain an asymptotically unbiased population size estimate. However, the assumption of sample independence may be unrealistic, especially when samples are derived from administrative sources. The sample independence assumption can be relaxed when three or more samples are used, which is therefore generally recommended. This may be a problem if the third sample is available much later than the first two samples. Therefore, in this paper we propose a new approach that deals with this issue by utilising older samples, using the so-called expectation maximisation algorithm. This leads to a population size nowcast estimate that is asymptotically unbiased under more relaxed assumptions than the estimate based on two samples. The resulting nowcasting model is applied to the problem of estimating the number of homeless people in The Netherlands, which leads to reasonably accurate nowcast estimates.
翻译:多重系统估计利用每个覆盖部分总体的样本来获得总体规模的估计值。理想情况下,所有可用样本均被使用,但如果某些样本(大幅)延迟获得,则可能仅使用早期可用的样本。在包括样本独立性在内的某些正则性条件下,两个样本足以获得渐近无偏的总体规模估计值。然而,样本独立性假设可能不切实际,特别是当样本来源于行政记录时。当使用三个或更多样本时,可以放宽样本独立性假设,因此通常推荐采用此方法。若第三个样本的获取时间远晚于前两个样本,则可能产生问题。为此,本文提出一种新方法,通过利用历史样本并采用所谓的期望最大化算法来处理该问题。这导出了一个在比双样本估计更宽松的假设下具有渐近无偏性的总体规模即时预测估计量。所构建的即时预测模型应用于荷兰无家可归者数量的估计问题,并获得了合理准确的即时预测结果。