Conducting valid statistical analyses is challenging in the presence of missing-not-at-random (MNAR) data, where the missingness mechanism is dependent on the missing values themselves even conditioned on the observed data. Here, we consider a MNAR model that generalizes several prior popular MNAR models in two ways: first, it is less restrictive in terms of statistical independence assumptions imposed on the underlying joint data distribution, and second, it allows for all variables in the observed sample to have missing values. This MNAR model corresponds to a so-called criss-cross structure considered in the literature on graphical models of missing data that prevents nonparametric identification of the entire missing data model. Nonetheless, part of the complete-data distribution remains nonparametrically identifiable. By exploiting this fact and considering a rich class of exponential family distributions, we establish sufficient conditions for identification of the complete-data distribution as well as the entire missingness mechanism. We then propose methods for testing the independence restrictions encoded in such models using odds ratio as our parameter of interest. We adopt two semiparametric approaches for estimating the odds ratio parameter and establish the corresponding asymptotic theories: one involves maximizing a conditional likelihood with order statistics and the other uses estimating equations. The utility of our methods is illustrated via simulation studies.
翻译:在存在非随机缺失(MNAR)数据的情况下,由于缺失机制即使在给定观测数据后仍依赖于缺失值本身,开展有效的统计分析极为困难。本文考虑了一种MNAR模型,该模型从两个方面推广了先前几种流行的MNAR模型:首先,它对底层联合数据分布施加的统计独立性假设限制较少;其次,它允许观测样本中的所有变量存在缺失值。该MNAR模型对应于缺失数据图形模型文献中所谓的交叉结构,这种结构阻碍了整个缺失数据模型的非参数识别。然而,完整数据分布的一部分仍然是非参数可识别的。利用这一事实,并考虑丰富的指数族分布,我们建立了识别完整数据分布以及整个缺失机制的充分条件。随后,我们以优势比作为目标参数,提出了检验此类模型所蕴含的独立性限制的方法。我们采用两种半参数方法来估计优势比参数,并建立了相应的渐近理论:一种方法涉及利用顺序统计量最大化条件似然,另一种方法则使用估计方程。通过模拟研究展示了我们方法的实用性。