We propose a constructive algorithm for identifying complete data distributions in graphical models of missing data. The complete data distribution is unrestricted, while the missingness mechanism is assumed to factorize according to a conditional directed acyclic graph. Our approach follows an interventionist perspective in which missingness indicators are treated as variables that can be intervened on. A central challenge in this setting is that sequences of interventions on missingness indicators may induce and propagate selection bias, so that identification can fail even when a propensity score is invariant to available interventions. To address this challenge, we introduce a tree-based identification algorithm that explicitly tracks the creation and propagation of selection bias and determines whether it can be avoided through admissible intervention strategies. The resulting tree provides both a diagnostic and a constructive characterization of identifiability under a given missingness mechanism. Building on these results, we develop recursive inverse probability weighting procedures that mirror the intervention logic of the identification algorithm, yielding valid estimating equations for both the missingness mechanism and functionals of the complete data distribution. Simulation studies and a real-data application illustrate the practical performance of the proposed methods. An accompanying R package, flexMissing, implements all proposed procedures.
翻译:本文提出了一种用于识别缺失数据图模型中完整数据分布的构造性算法。完整数据分布不受限制,而缺失机制则假设按照条件有向无环图进行因子分解。我们的方法遵循干预主义视角,将缺失指示变量视为可干预的变量。该场景下的核心挑战在于:对缺失指示变量进行的一系列干预可能诱发并传播选择偏倚,导致即使倾向得分对可用干预保持不变,识别仍可能失败。为解决这一挑战,我们引入了一种基于树的识别算法,该算法显式追踪选择偏倚的产生与传播过程,并判断是否可通过可容许的干预策略予以规避。生成的树结构既提供了诊断功能,也对给定缺失机制下的可识别性给出了构造性表征。基于这些结果,我们开发了递归逆概率加权程序,该程序与识别算法的干预逻辑相呼应,为缺失机制及完整数据分布泛函提供了有效的估计方程。仿真研究与实际数据应用展示了所提方法的实际性能。随附的R软件包flexMissing实现了所有提出的程序。