Multiple systems estimation is a standard approach to quantifying hidden populations where data sources are based on lists of known cases. A typical modelling approach is to fit a Poisson loglinear model to the numbers of cases observed in each possible combination of the lists. It is necessary to decide which interaction parameters to include in the model, and information criterion approaches are often used for model selection. Difficulties in the context of multiple systems estimation may arise due to sparse or nil counts based on the intersection of lists, and care must be taken when information criterion approaches are used for model selection due to issues relating to the existence of estimates and identifiability of the model. Confidence intervals are often reported conditional on the model selected, providing an over-optimistic impression of the accuracy of the estimation. A bootstrap approach is a natural way to account for the model selection procedure. However, because the model selection step has to be carried out for every bootstrap replication, there may be a high or even prohibitive computational burden. We explore the merit of modifying the model selection procedure in the bootstrap to look only among a subset of models, chosen on the basis of their information criterion score on the original data. This provides large computational gains with little apparent effect on inference. Another model selection approach considered and investigated is a downhill search approach among models, possibly with multiple starting points.
翻译:多重系统估计是一种量化隐藏人群的标准方法,其中数据来源基于已知案例的列表。典型的建模方法是对列表中各可能组合观测到的案例数拟合泊松对数线性模型。需要决定模型中包含哪些交互参数,信息准则方法常被用于模型选择。在多重系统估计的背景下,由于列表交集导致的稀疏或零计数可能会带来困难,且当使用信息准则进行模型选择时,需特别注意估计存在性和模型可辨识性问题。置信区间通常基于所选模型进行条件报道,这可能对估计准确性造成过度乐观的印象。自举法是一种自然的方式以纳入模型选择过程。然而,由于每个自举重抽样都必须执行模型选择步骤,可能会带来高昂甚至难以承受的计算负担。我们探索了在自举中修改模型选择程序的优点,即仅在原数据信息准则得分基础上预先选定的模型子集中进行搜索。这种方法能在几乎不影响推断的情况下大幅提升计算效率。另一种被考虑和研究的模型选择方法是基于模型的下坡搜索法,可能采用多个起始点。