We propose a novel framework for reconstructing the chronology of genetic regulation using causal inference based on Pearl's theory. The approach proceeds in three main stages: causal discovery, causal inference, and chronology construction. We apply it to the ndhB and ndhD genes of the chloroplast in Arabidopsis thaliana, generating four alternative maturation timeline models per gene, each derived from a different causal discovery algorithm (HC, PC, LiNGAM, or NOTEARS). Two methodological challenges are addressed: the presence of missing data, handled via an EM algorithm that jointly imputes missing values and estimates the Bayesian network, and the selection of the $\ell_1$-regularization parameter in NOTEARS, for which we introduce a stability selection strategy. The resulting causal models consistently outperform reference chronologies in terms of both reliability and model fit. Moreover, by combining causal reasoning with domain expertise, the framework enables the formulation of testable hypotheses and the design of targeted experimental interventions grounded in theoretical predictions.
翻译:我们提出了一种基于Pearl因果理论的新型框架,用于重构基因调控的时间顺序。该方法包含三个主要阶段:因果发现、因果推断和时间线构建。我们将其应用于拟南芥叶绿体中的ndhB和ndhD基因,为每个基因生成了四种不同的成熟时间线模型,每种模型分别源自不同的因果发现算法(HC、PC、LiNGAM或NOTEARS)。研究解决了两个方法学挑战:针对缺失数据问题,我们通过EM算法联合进行缺失值填补和贝叶斯网络估计;针对NOTEARS中$\ell_1$正则化参数的选择问题,我们引入了稳定性选择策略。所得因果模型在可靠性和模型拟合度方面均持续优于参考时间线。此外,通过将因果推理与领域专业知识相结合,该框架能够基于理论预测形成可检验的假设,并设计有针对性的实验干预方案。