Multiple systems estimation is a standard approach to quantifying hidden populations where data sources are based on lists of known cases. A typical modelling approach is to fit a Poisson loglinear model to the numbers of cases observed in each possible combination of the lists. It is necessary to decide which interaction parameters to include in the model, and information criterion approaches are often used for model selection. Difficulties in the context of multiple systems estimation may arise due to sparse or nil counts based on the intersection of lists, and care must be taken when information criterion approaches are used for model selection due to issues relating to the existence of estimates and identifiability of the model. Confidence intervals are often reported conditional on the model selected, providing an over-optimistic impression of the accuracy of the estimation. A bootstrap approach is a natural way to account for the model selection procedure. However, because the model selection step has to be carried out for every bootstrap replication, there may be a high or even prohibitive computational burden. We explore the merit of modifying the model selection procedure in the bootstrap to look only among a subset of models, chosen on the basis of their information criterion score on the original data. This provides large computational gains with little apparent effect on inference. Another model selection approach considered and investigated is a downhill search approach among models, possibly with multiple starting points.
翻译:多系统估计是一种量化隐藏人群的标准方法,其中数据来源基于已知案例的名单。典型的建模方法是对每个可能名单组合中观察到的案例数量拟合泊松对数线性模型。需要决定模型中包含哪些交互参数,信息准则方法常被用于模型选择。在多系统估计的背景下,由于基于名单交集的稀疏或零计数,可能出现困难;并且由于估计存在性和模型可辨识性问题,使用信息准则方法进行模型选择时必须谨慎。通常报告的条件于所选模型的置信区间会给人过于乐观的估计精度印象。自助法是一种自然的方式来考虑模型选择过程。然而,由于每个自助复制都必须执行模型选择步骤,可能会产生很高甚至无法承受的计算负担。我们探讨了在自助法中修改模型选择程序的价值,即仅基于原始数据的信息准则评分,在模型子集中进行选择。这带来了巨大的计算收益,且对推断几乎没有明显影响。另一种被考虑和研究的模型选择方法是模型中的下坡搜索方法,可能采用多个起始点。