In official statistics, dual system estimation (DSE) is a well-known tool to estimate the size of a population. Two sources are linked, and the number of units that are missed by both sources is estimated. Often dual system estimation is carried out in each of the levels of a stratifying variable, such as region. DSE can be considered a loglinear independence model, and, with a stratifying variable, a loglinear conditional independence model. The standard approach is to estimate parameters for each level of the stratifying variable. Thus, when the number of levels of the stratifying variable is large, the number of parameters estimated is large as well. Mixed effects loglinear models, where sets of parameters involving the stratifying variable are replaced by a distribution parameterised by its mean and a variance, have also been proposed, and we investigate their properties through simulation. In our simulation studies the mixed effects loglinear model outperforms the fixed effects loglinear model although only to a small extent in terms of mean squared error. We show how mixed effects dual system estimation can be extended to multiple system estimation.
翻译:在官方统计中,双系统估计(DSE)是一种广为人知的用于估计总体规模的工具。两个来源相链接,并估计同时被两个来源遗漏的单位数量。通常,双系统估计会在分层变量的每个层级(如区域)中执行。DSE可被视为对数线性独立性模型,而加入分层变量后,则成为对数线性条件独立性模型。标准方法是为分层变量的每个层级估计参数。因此,当分层变量的层级数较大时,估计的参数数量也相应增大。已有研究提出混合效应的对数线性模型,该类模型将涉及分层变量的参数集替换为以其均值和方差为参数化的分布,我们通过模拟研究其性质。在我们的模拟研究中,混合效应的对数线性模型优于固定效应模型,尽管在均方误差方面的改善程度较小。我们还展示了如何将混合效应的双系统估计推广到多系统估计。