We consider the problem of learning a set of direct causes of a target variable from an observational joint distribution. Learning directed acyclic graphs (DAGs) that represent the causal structure is a fundamental problem in science. Several results are known when the full DAG is identifiable from the distribution, such as assuming a nonlinear Gaussian data-generating process. Often, we are only interested in identifying the direct causes of one target variable (local causal structure), not the full DAG. In this paper, we discuss different assumptions for the data-generating process of the target variable under which the set of direct causes is identifiable from the distribution. While doing so, we put essentially no assumptions on the variables other than the target variable. In addition to the novel identifiability results, we provide two practical algorithms for estimating the direct causes from a finite random sample and demonstrate their effectiveness on several benchmark datasets. We apply this framework to learn direct causes of the reduction in fertility rates in different countries.
翻译:我们考虑从观测联合分布中学习目标变量的一组直接原因的问题。学习表示因果结构的有向无环图(DAG)是科学中的一个基本问题。当完整DAG可从分布中识别时(例如假设非线性高斯数据生成过程),已有若干已知结果。通常,我们仅对识别一个目标变量的直接原因(局部因果结构)感兴趣,而非完整DAG。本文讨论了在不同数据生成过程假设下,目标变量的直接原因集合可从分布中识别的条件。在此过程中,我们除了对目标变量外,几乎不对其他变量施加任何假设。除了新的可识别性结果外,我们还提供了两种从有限随机样本中估计直接原因的实用算法,并在多个基准数据集上验证了其有效性。我们将这一框架应用于识别不同国家生育率下降的直接原因。