The causal revolution has spurred interest in understanding complex relationships in various fields. Most existing methods aim to discover causal relationships among all variables in a large-scale complex graph. However, in practice, only a small number of variables in the graph are relevant for the outcomes of interest. As a result, causal estimation with the full causal graph -- especially given limited data -- could lead to many falsely discovered, spurious variables that may be highly correlated with but have no causal impact on the target outcome. In this paper, we propose to learn a class of necessary and sufficient causal graphs (NSCG) that only contains causally relevant variables for an outcome of interest, which we term causal features. The key idea is to utilize probabilities of causation to systematically evaluate the importance of features in the causal graph, allowing us to identify a subgraph that is relevant to the outcome of interest. To learn NSCG from data, we develop a score-based necessary and sufficient causal structural learning (NSCSL) algorithm, by establishing theoretical relationships between probabilities of causation and causal effects of features. Across empirical studies of simulated and real data, we show that the proposed NSCSL algorithm outperforms existing algorithms and can reveal important yeast genes for target heritable traits of interest.
翻译:因果革命激发了人们对理解各领域复杂关系的兴趣。现有大多数方法旨在发现大规模复杂图中所有变量之间的因果关系。然而,在实践中,图中仅有少量变量与目标结果相关。因此,基于完整因果图的因果估计——尤其是在数据有限的情况下——可能导致许多被错误发现的伪变量,这些变量可能与目标结果高度相关,但并无因果影响。本文提出学习一类必要且充分的因果图(NSCG),该图仅包含与目标结果因果相关的变量,我们称之为因果特征。核心思想是利用因果概率系统地评估因果图中特征的重要性,从而识别出与目标结果相关的子图。为从数据中学习NSCG,我们开发了一种基于评分的必要且充分因果结构学习算法(NSCSL),通过建立因果概率与特征因果效应之间的理论关系。在模拟数据和真实数据的实证研究中,我们表明所提出的NSCSL算法优于现有算法,并能揭示与目标可遗传性状相关的重要酵母基因。