Multiple systems estimation using a Poisson loglinear model is a standard approach to quantifying hidden populations where data sources are based on lists of known cases. Information criteria are often used for selecting between the large number of possible models. Confidence intervals are often reported conditional on the model selected, providing an over-optimistic impression of estimation accuracy. A bootstrap approach is a natural way to account for the model selection. However, because the model selection step has to be carried out for every bootstrap replication, there may be a high or even prohibitive computational burden. We explore the merit of modifying the model selection procedure in the bootstrap to look only among a subset of models, chosen on the basis of their information criterion score on the original data. This provides large computational gains with little apparent effect on inference. We also incorporate rigorous and economical ways of approaching issues of the existence of estimators when applying the method to sparse data tables.
翻译:基于泊松对数线性模型的多系统估计是量化隐藏人群的标准方法,其中数据源依赖于已知病例列表。信息准则常用于在大量候选模型中进行选择。置信区间通常基于选定模型进行条件报告,这会导致对估计精度的过度乐观印象。自举方法是一种自然的考虑模型选择的方式。然而,由于每个自举复制都需要执行模型选择步骤,可能会带来高昂甚至难以承受的计算负担。我们探索了在自举中修改模型选择程序的价值,即仅基于原始数据信息准则得分从模型子集中进行选择。这种方法在显著降低计算成本的同时,对推断结果几乎没有影响。我们还提出了严谨且经济的方法,以解决将该方法应用于稀疏数据表时估计量存在性的相关问题。