Nonresponse after probability sampling is a universal challenge in survey sampling, often necessitating adjustments to mitigate sampling and selection bias simultaneously. This study explored the removal of bias and effective utilization of available information, not just in nonresponse but also in the scenario of data integration, where summary statistics from other data sources are accessible. We reformulate these settings within a two-step monotone missing data framework, where the first step of missingness arises from sampling and the second originates from nonresponse. Subsequently, we derive the semiparametric efficiency bound for the target parameter. We also propose adaptive estimators utilizing methods of moments and empirical likelihood approaches to attain the lower bound. The proposed estimator exhibits both efficiency and double robustness. However, attaining efficiency with an adaptive estimator requires the correct specification of certain working models. To reinforce robustness against the misspecification of working models, we extend the property of double robustness to multiple robustness by proposing a two-step empirical likelihood method that effectively leverages empirical weights. A numerical study is undertaken to investigate the finite-sample performance of the proposed methods. We further applied our methods to a dataset from the National Health and Nutrition Examination Survey data by efficiently incorporating summary statistics from the National Health Interview Survey data.
翻译:概率抽样后的非响应是抽样调查中普遍存在的挑战,通常需要同时调整以减轻抽样偏倚与选择偏倚。本研究不仅探讨了非响应情形下的偏倚消除与可用信息的有效利用,还扩展至数据整合场景——即当其他数据源的汇总统计量可获取时。我们将这些设定重新构建于两步单调缺失数据框架中:第一步缺失源于抽样过程,第二步缺失则来自非响应。随后,我们推导了目标参数的半参数效率下界。为达到该下界,我们提出了基于矩估计方法与经验似然方法的自适应估计量。所提出的估计量兼具效率性与双重稳健性。然而,自适应估计量要达到效率性要求需正确设定某些工作模型。为增强对工作模型误设的稳健性,我们通过提出一种有效利用经验权重的两步经验似然方法,将双重稳健性扩展至多重稳健性。我们通过数值研究考察了所提方法在有限样本下的表现。进一步地,我们通过高效整合国家健康访谈调查数据的汇总统计量,将所提方法应用于国家健康与营养调查数据集的实证分析中。