Missing data often result in undesirable bias and loss of efficiency. These issues become substantial when the response mechanism is nonignorable, meaning that the response model depends on unobserved variables. To manage nonignorable nonresponse, it is necessary to estimate the joint distribution of unobserved variables and response indicators. However, model misspecification and identification issues can prevent robust estimates, even with careful estimation of the target joint distribution. In this study, we modeled the distribution of the observed parts and derived sufficient conditions for model identifiability, assuming a logistic regression model as the response mechanism and generalized linear models as the main outcome model of interest. More importantly, the derived sufficient conditions do not require any instrumental variables, which are often assumed to guarantee model identifiability but cannot be practically determined beforehand. To analyze missing data in applications, we propose practical guidelines and sensitivity analysis to determine the response mechanism. Furthermore, we present the performance of the proposed estimators in numerical studies and apply the proposed method to two sets of real data: exit polls from the 19th South Korean election and public data collected from the Korean Survey of Household Finances and Living Conditions.
翻译:缺失数据常导致不理想的偏差和效率损失。当响应机制不可忽略时(即响应模型依赖于未观测变量),这些问题会变得尤为显著。为处理不可忽略的无响应问题,必须估计未观测变量与响应指示符的联合分布。然而,即使对目标联合分布进行精细估计,模型误设与识别问题仍可能阻碍稳健估计的获得。本研究对观测部分的分布进行建模,并在假设逻辑回归模型作为响应机制、广义线性模型作为主要目标结果模型的前提下,推导出模型可识别性的充分条件。更重要的是,所推导的充分条件无需任何工具变量——这类变量虽常被假设用于保证模型可识别性,但实际中无法预先确定。针对应用中的缺失数据分析,我们提出了确定响应机制的实用指南与敏感性分析方法。此外,我们通过数值研究展示了所提估计量的性能,并将该方法应用于两组真实数据:第19届韩国大选出口民调数据,以及韩国家庭金融与生活状况调查收集的公开数据。