We consider the problem of learning a set of direct causes of a target variable from an observational joint distribution. Learning directed acyclic graphs (DAGs) that represent the causal structure is a fundamental problem in science. Several results are known when the full DAG is identifiable from the distribution, such as assuming a nonlinear Gaussian data-generating process. Here, we are only interested in identifying the direct causes of one target variable (local causal structure), not the full DAG. This allows us to relax the identifiability assumptions and develop possibly faster and more robust algorithms. In contrast to the Invariance Causal Prediction framework, we only assume that we observe one environment without any interventions. We discuss different assumptions for the data-generating process of the target variable under which the set of direct causes is identifiable from the distribution. While doing so, we put essentially no assumptions on the variables other than the target variable. In addition to the novel identifiability results, we provide two practical algorithms for estimating the direct causes from a finite random sample and demonstrate their effectiveness on several benchmark and real datasets.
翻译:我们考虑从观测联合分布中学习目标变量直接因果集合的问题。学习表示因果结构的有向无环图(DAG)是科学领域的基础性问题。当完整DAG可从分布中识别时(例如假设非线性高斯数据生成过程),已有若干研究成果。本文仅关注识别单一目标变量(局部因果结构)的直接因果关系,而非完整DAG。这使得我们可以放宽可识别性假设,并开发可能更快速、更稳健的算法。与不变因果预测框架相比,我们仅假设观测到单一环境且无任何干预措施。我们讨论了目标变量数据生成过程的不同假设条件,在这些条件下直接因果集合可从分布中识别。在此过程中,除目标变量外,我们基本不对其他变量施加任何假设。除了新颖的可识别性结果外,我们提供了两种从有限随机样本中估计直接因果关系的实用算法,并在多个基准数据集和真实数据集上验证了其有效性。