We study the problem of identifying the unknown intervention targets in structural causal models where we have access to heterogeneous data collected from multiple environments. The unknown intervention targets are the set of endogenous variables whose corresponding exogenous noises change across the environments. We propose a two-phase approach which in the first phase recovers the exogenous noises corresponding to unknown intervention targets whose distributions have changed across environments. In the second phase, the recovered noises are matched with the corresponding endogenous variables. For the recovery phase, we provide sufficient conditions for learning these exogenous noises up to some component-wise invertible transformation. For the matching phase, under the causal sufficiency assumption, we show that the proposed method uniquely identifies the intervention targets. In the presence of latent confounders, the intervention targets among the observed variables cannot be determined uniquely. We provide a candidate intervention target set which is a superset of the true intervention targets. Our approach improves upon the state of the art as the returned candidate set is always a subset of the target set returned by previous work. Moreover, we do not require restrictive assumptions such as linearity of the causal model or performing invariance tests to learn whether a distribution is changing across environments which could be highly sample inefficient. Our experimental results show the effectiveness of our proposed algorithm in practice.
翻译:我们研究了在从多个环境收集的异构数据中识别结构因果模型中未知干预目标的问题。未知干预目标是指其对应的外生噪声在不同环境中发生变化的内生变量集合。我们提出了一种两阶段方法:第一阶段恢复对应于未知干预目标的外生噪声,这些噪声的分布在环境中已发生变化;第二阶段将恢复的噪声与对应的内生变量进行匹配。在恢复阶段,我们提供了学习这些外生噪声(直到某个分量可逆变换)的充分条件。在匹配阶段,假设因果充分性成立,我们证明所提出的方法能够唯一地识别干预目标。在存在潜在混淆变量的情况下,观测变量中的干预目标无法唯一确定。我们提供了一个候选干预目标集,它是真实干预目标的超集。我们的方法改进了现有技术,因为返回的候选集始终是先前工作返回的目标集的子集。此外,我们不需要诸如因果模型的线性性或执行不变性检验来学习分布是否在环境中变化等限制性假设,这些检验可能样本效率极低。我们的实验结果显示了所提出算法在实际中的有效性。