Identification of Causal Structure in the Presence of Missing Data with Additive Noise Model

Missing data are an unavoidable complication frequently encountered in many causal discovery tasks. When a missing process depends on the missing values themselves (known as self-masking missingness), the recovery of the joint distribution becomes unattainable, and detecting the presence of such self-masking missingness remains a perplexing challenge. Consequently, due to the inability to reconstruct the original distribution and to discern the underlying missingness mechanism, simply applying existing causal discovery methods would lead to wrong conclusions. In this work, we found that the recent advances additive noise model has the potential for learning causal structure under the existence of the self-masking missingness. With this observation, we aim to investigate the identification problem of learning causal structure from missing data under an additive noise model with different missingness mechanisms, where the `no self-masking missingness' assumption can be eliminated appropriately. Specifically, we first elegantly extend the scope of identifiability of causal skeleton to the case with weak self-masking missingness (i.e., no other variable could be the cause of self-masking indicators except itself). We further provide the sufficient and necessary identification conditions of the causal direction under additive noise model and show that the causal structure can be identified up to an IN-equivalent pattern. We finally propose a practical algorithm based on the above theoretical results on learning the causal skeleton and causal direction. Extensive experiments on synthetic and real data demonstrate the efficiency and effectiveness of the proposed algorithms.

翻译：缺失数据是许多因果发现任务中不可避免的常见复杂情况。当缺失过程依赖于缺失值本身（即自掩蔽缺失）时，联合分布的重建变得不可行，而检测此类自掩蔽缺失的存在仍是一个棘手的难题。因此，由于无法重建原始分布且难以识别潜在缺失机制，直接应用现有因果发现方法将导致错误结论。本研究发现，近期发展的加性噪声模型具有在存在自掩蔽缺失情况下学习因果结构的潜力。基于此观察，我们旨在探究在不同缺失机制下，通过加性噪声模型从缺失数据中学习因果结构的识别问题，从而可以适当消除"无自掩蔽缺失"的假设。具体而言，我们首先优雅地将因果骨架的可识别性范围扩展至弱自掩蔽缺失情形（即除变量自身外无其他变量可作为自掩蔽指示变量的原因）。进一步，我们给出了加性噪声模型下因果方向识别的充分必要条件，并证明因果结构可被识别至IN等价模式。最终，我们基于上述理论结果提出了一种实用的因果骨架与因果方向学习算法。在合成数据与真实数据上的大量实验证明了所提算法的效率与有效性。