The causal revolution has stimulated interest in understanding complex relationships in various fields. Most of the existing methods aim to discover causal relationships among all variables within a complex large-scale graph. However, in practice, only a small subset of variables in the graph are relevant to the outcomes of interest. Consequently, causal estimation with the full causal graph -- particularly given limited data -- could lead to numerous falsely discovered, spurious variables that exhibit high correlation with, but exert no causal impact on, the target outcome. In this paper, we propose learning a class of necessary and sufficient causal graphs (NSCG) that exclusively comprises causally relevant variables for an outcome of interest, which we term causal features. The key idea is to employ probabilities of causation to systematically evaluate the importance of features in the causal graph, allowing us to identify a subgraph relevant to the outcome of interest. To learn NSCG from data, we develop a necessary and sufficient causal structural learning (NSCSL) algorithm, by establishing theoretical properties and relationships between probabilities of causation and natural causal effects of features. Across empirical studies of simulated and real data, we demonstrate that NSCSL outperforms existing algorithms and can reveal crucial yeast genes for target heritable traits of interest.
翻译:因果革命激发了各个领域对理解复杂关系的兴趣。现有的大多数方法旨在发现复杂大规模图中所有变量间的因果关系。然而在实践中,图中只有一小部分变量与目标结果相关。因此,在有限数据下基于完整因果图进行因果估计可能导致大量虚假变量被错误发现——这些变量与目标结果高度相关,但对目标结果并无因果影响。本文提出学习一类必要且充分的因果图(NSCG),该图仅包含对目标结果具有因果相关性的变量(我们称之为因果特征)。核心思想是利用因果概率系统评估因果图中特征的重要性,从而识别与目标结果相关的子图。为从数据中学习NSCG,我们开发了必要且充分的因果结构学习算法(NSCSL),建立了因果概率与特征自然因果效应之间的理论性质及关联。在模拟数据和真实数据的实证研究中,我们证明NSCSL优于现有算法,并能揭示影响目标可遗传性状的关键酵母基因。