Valid statistical inference is challenging when the sample is subject to unknown selection bias. Data integration can be used to correct for selection bias when we have a parallel probability sample from the same population with some common measurements. How to model and estimate the selection probability or the propensity score (PS) of a non-probability sample using an independent probability sample is the challenging part of the data integration. We approach this difficult problem by employing multiple candidate models for PS combined with empirical likelihood. By incorporating multiple propensity score models into the internal bias calibration constraint in the empirical likelihood setup, the selection bias can be eliminated so long as the multiple candidate models contain a true PS model. The bias calibration constraint under the multiple PS models is called multiple bias calibration. Multiple PS models can include both missing-at-random and missing-not-at-random models. Asymptotic properties are discussed, and some limited simulation studies are presented to compare the proposed method with some existing competitors. Plasmode simulation studies using the Culture \& Community in a Time of Crisis dataset demonstrate the practical usage and advantages of the proposed method.
翻译:当样本存在未知选择偏差时,有效的统计推断面临挑战。通过数据整合技术,利用同一总体中具有共同测量指标的平行概率样本,可校正选择偏差。如何基于独立概率样本对非概率样本的选择概率或倾向性得分进行建模与估计,是数据整合中的关键难点。为此,我们采用经验似然方法结合多个候选倾向性得分模型处理该难题。通过将多个倾向性得分模型纳入经验似然框架下的内部偏差校准约束,只要候选模型中包含真实的倾向性得分模型,即可消除选择偏差。基于多重倾向性得分模型的偏差校准约束称为多重偏差校准。这些候选模型可涵盖随机缺失与非随机缺失两类模型。本文讨论了渐近性质,并通过有限模拟研究将所提方法与现有方法进行对比。基于"危机时期的文化与社区"数据集开展的Plasmode模拟实验,验证了所提方法的实用性与优势。