Multiple systems estimation is a standard approach to quantifying hidden populations where data sources are based on lists of known cases. A typical modelling approach is to fit a Poisson loglinear model to the numbers of cases observed in each possible combination of the lists. It is necessary to decide which interaction parameters to include in the model, and information criterion approaches are often used for model selection. Difficulties in the context of multiple systems estimation may arise due to sparse or nil counts based on the intersection of lists, and care must be taken when information criterion approaches are used for model selection due to issues relating to the existence of estimates and identifiability of the model. Confidence intervals are often reported conditional on the model selected, providing an over-optimistic impression of the accuracy of the estimation. A bootstrap approach is a natural way to account for the model selection procedure. However, because the model selection step has to be carried out for every bootstrap replication, there may be a high or even prohibitive computational burden. We explore the merit of modifying the model selection procedure in the bootstrap to look only among a subset of models, chosen on the basis of their information criterion score on the original data. This provides large computational gains with little apparent effect on inference. Another model selection approach considered and investigated is a downhill search approach among models, possibly with multiple starting points.
翻译:多系统估计是一种量化隐藏人群的标准方法,其数据来源基于已知病例的名单。典型的建模方法是针对各列表可能组合中观察到的病例数量拟合泊松对数线性模型。在建模过程中,需要决定模型中应包含哪些交互参数,而信息准则方法常被用于模型选择。由于列表交叉可能导致稀疏计数或零计数,多系统估计在此背景下可能面临困难;同时,当使用信息准则方法进行模型选择时,需谨慎处理因估计量存在性和模型可识别性引发的问题。置信区间通常基于所选模型报告,这可能导致对估计精度的过度乐观印象。自助法是一种自然的方法,可用于考虑模型选择过程。然而,由于需对每个自助复制执行模型选择步骤,这可能导致高昂甚至难以承受的计算负担。我们探索了一种改进自助法模型选择流程的方案:仅基于原始数据的信息准则得分,在模型的子集中进行选择。该方法在显著提升计算效率的同时,对推断结果影响甚微。此外,我们研究并评估了另一种模型选择方法——在模型中采用下坡搜索(可能包含多个起始点)。