We study the problem of identifying the unknown intervention targets in structural causal models where we have access to heterogeneous data collected from multiple environments. The unknown intervention targets are the set of endogenous variables whose corresponding exogenous noises change across the environments. We propose a two-phase approach which in the first phase recovers the exogenous noises corresponding to unknown intervention targets whose distributions have changed across environments. In the second phase, the recovered noises are matched with the corresponding endogenous variables. For the recovery phase, we provide sufficient conditions for learning these exogenous noises up to some component-wise invertible transformation. For the matching phase, under the causal sufficiency assumption, we show that the proposed method uniquely identifies the intervention targets. In the presence of latent confounders, the intervention targets among the observed variables cannot be determined uniquely. We provide a candidate intervention target set which is a superset of the true intervention targets. Our approach improves upon the state of the art as the returned candidate set is always a subset of the target set returned by previous work. Moreover, we do not require restrictive assumptions such as linearity of the causal model or performing invariance tests to learn whether a distribution is changing across environments which could be highly sample inefficient. Our experimental results show the effectiveness of our proposed algorithm in practice.
翻译:我们研究了在多源异构数据环境下,识别结构因果模型中未知干预目标的问题。未知干预目标是指其对应外生噪声在不同环境间发生变化的内生变量集合。我们提出了一种两阶段方法:第一阶段恢复与未知干预目标对应的、其分布已跨环境发生变化的外生噪声;第二阶段将恢复的噪声与相应的内生变量进行匹配。在恢复阶段,我们给出了在成分可逆变换意义下学习这些外生噪声的充分条件。在匹配阶段,基于因果充分性假设,我们证明了所提方法能唯一确定干预目标。当存在潜在混淆因子时,观测变量中的干预目标无法被唯一确定。我们提出了一种候选干预目标集合,该集合是真实干预目标的超集。我们的方法改进了现有技术,因为返回的候选集合始终是先前工作返回目标集合的子集。此外,我们不需要限制性假设,如因果模型的线性性,或通过执行不变性检验来学习分布是否跨环境变化(后者可能样本效率极低)。实验结果表明了所提算法在实际中的有效性。