Machine learning can benefit from causal discovery for interpretation and from causal inference for generalization. In this line of research, a few invariant learning algorithms for out-of-distribution (OOD) generalization have been proposed by using multiple training environments to find invariant relationships. Some of them are focused on causal discovery as Invariant Causal Prediction (ICP), which finds causal parents of a variable of interest, and some directly provide a causal optimal predictor that generalizes well in OOD environments as Invariant Risk Minimization (IRM). This group of algorithms works under the assumption of multiple environments that represent different interventions in the causal inference context. Those environments are not normally available when working with observational data and real-world applications. Here we propose a method to generate them in an efficient way. We assess the performance of this unsupervised learning problem by implementing ICP on simulated data. We also show how to apply ICP efficiently integrated with our method for causal discovery. Finally, we proposed an improved version of our method in combination with ICP for datasets with multiple covariates where ICP and other causal discovery methods normally degrade in performance.
翻译:机器学习可从因果发现中获益以实现可解释性,并从因果推断中获益以提升泛化能力。在此研究方向上,通过利用多个训练环境寻找不变关系,学者们提出了若干用于分布外(OOD)泛化的不变学习算法。其中部分算法专注于因果发现(如不变因果预测ICP),用于识别目标变量的因果父节点;另一些算法则直接提供在OOD环境中具有良好泛化能力的因果最优预测器(如不变风险最小化IRM)。这类算法需假设存在多个环境,这些环境在因果推断语境中代表不同的干预条件。然而在观察数据及实际应用场景中,此类环境通常不可直接获取。本文提出一种高效生成多环境的方法,并通过在模拟数据上实施ICP评估该无监督学习方法的性能。同时展示如何将ICP与所提方法高效集成以实现因果发现。最后,针对ICP及其他因果发现方法在多协变量数据中性能下降的问题,我们提出了所提方法与ICP相结合的改进方案。