We study the problem of missing not at random (MNAR) datasets with binary outcomes. We propose an exponential tilt based approach that bypasses any knowledge on 'nonresponse instruments' or 'shadow variables' that are usually required for statistical estimation. We establish a sufficient condition for identifiability of tilt parameters and propose an algorithm to estimate them. Based on these tilt parameter estimates, we propose importance weighted and doubly robust estimators for any mean functions of interest, and validate their performances in a synthetic dataset. In an experiment with the Waterbirds dataset, we utilize our tilt framework to perform unsupervised transfer learning, when the responses are missing from a target domain of interest, and achieve a prediction performance that is comparable to a gold standard.
翻译:本研究针对二元结果变量的缺失非随机(MNAR)数据集问题展开探讨。我们提出一种基于指数倾斜的方法,该方法无需依赖统计估计通常所需的"无应答工具变量"或"影子变量"知识。我们建立了倾斜参数可识别性的充分条件,并提出了相应的估计算法。基于这些倾斜参数估计值,我们针对任意目标均值函数提出了重要性加权估计量与双重稳健估计量,并在合成数据集中验证了其性能。在Waterbirds数据集的实验中,我们运用所提出的倾斜框架,在目标域响应变量缺失的情况下实现了无监督迁移学习,其预测性能达到了与黄金标准相当的水平。