Imputation methods for dealing with incomplete data typically assume that the missingness mechanism is at random (MAR). These methods can also be applied to missing not at random (MNAR) situations, where the user specifies some adjustment parameters that describe the degree of departure from MAR. The effect of different pre-chosen values is then studied on the inferences. This paper proposes a novel imputation method, the Random Indicator (RI) method, which, in contrast to the current methodology, estimates these adjustment parameters from the data. For an incomplete variable $X$, the RI method assumes that the observed part of $X$ is normal and the probability for $X$ to be missing follows a logistic function. The idea is to estimate the adjustment parameters by generating a pseudo response indicator from this logistic function. Our method iteratively draws imputations for $X$ and the realization of the response indicator $R$, to which we refer as $\dot{R}$, for $X$. By cross-classifying $X$ by $R$ and $\dot{R}$, we obtain various properties on the distribution of the missing data. These properties form the basis for estimating the degree of departure from MAR. Our numerical simulations show that the RI method performs very well across a variety of situations. We show how the method can be used in a real life data set. The RI method is automatic and opens up new ways to tackle the problem of MNAR data.
翻译:处理不完整数据的插补方法通常假设缺失机制为随机缺失(MAR)。这些方法也可应用于非随机缺失(MNAR)场景,此时用户需指定若干调整参数以描述偏离MAR的程度,进而研究不同预设值对推断结果的影响。本文提出一种新型插补方法——随机指示符(RI)方法,与现有方法不同,该方法能从数据中估计这些调整参数。对于不完整变量 $X$,RI方法假设$X$的可观测部分服从正态分布,而$X$缺失的概率遵循逻辑函数。其核心思想是通过从该逻辑函数生成伪响应指示符来估计调整参数。该方法迭代地为$X$及其响应指示符$R$的实现在生成插补值(我们将其记为$\dot{R}$)。通过对$X$按$R$和$\dot{R}$进行交叉分类,可获得缺失数据分布的各种性质,这些性质构成了估计偏离MAR程度的基础。数值模拟表明,RI方法在多种情境下均表现优异。我们展示了该方法在实际数据集中的应用。RI方法具有自动化的特点,为处理MNAR数据问题开辟了新途径。