If part of a population is hidden but two or more sources are available that each cover parts of this population, dual- or multiple-system(s) estimation can be applied to estimate this population. For this it is common to use the log-linear model, estimated with maximum likelihood. These maximum likelihood estimates are based on a non-linear model and therefore suffer from finite-sample bias, which can be substantial in case of small samples or a small population size. This problem was recognised by Chapman, who derived an estimator with good small sample properties in case of two available sources. However, he did not derive an estimator for more than two sources. We propose an estimator that is an extension of Chapman's estimator to three or more sources and compare this estimator with other bias-reduced estimators in a simulation study. The proposed estimator performs well, and much better than the other estimators. A real data example on homelessness in the Netherlands shows that our proposed model can make a substantial difference.
翻译:当总体的一部分是隐藏的,但存在两个或多个覆盖该总体部分的数据源时,可采用双重或多系统估计来估算该总体规模。通常使用对数线性模型,并通过最大似然法进行估计。这些最大似然估计基于非线性模型,因此存在有限样本偏差,在小样本或总体规模较小的情况下,这种偏差可能相当显著。Chapman 认识到了这一问题,并针对两个可用数据源的情况,推导出了一种具有良好小样本性质的估计量。然而,他并未推导出针对多于两个数据源的估计量。我们提出了一种将 Chapman 估计量推广到三个及以上数据源的估计量,并在模拟研究中将其与其他偏差减少的估计量进行了比较。所提出的估计量表现良好,且远优于其他估计量。一个关于荷兰无家可归者的真实数据案例表明,我们提出的模型可产生实质性差异。