This article presents a novel method for causal discovery with generalized structural equation models suited for analyzing diverse types of outcomes, including discrete, continuous, and mixed data. Causal discovery often faces challenges due to unmeasured confounders that hinder the identification of causal relationships. The proposed approach addresses this issue by developing two peeling algorithms (bottom-up and top-down) to ascertain causal relationships and valid instruments. This approach first reconstructs a super-graph to represent ancestral relationships between variables, using a peeling algorithm based on nodewise GLM regressions that exploit relationships between primary and instrumental variables. Then, it estimates parent-child effects from the ancestral relationships using another peeling algorithm while deconfounding a child's model with information borrowed from its parents' models. The article offers a theoretical analysis of the proposed approach, which establishes conditions for model identifiability and provides statistical guarantees for accurately discovering parent-child relationships via the peeling algorithms. Furthermore, the article presents numerical experiments showcasing the effectiveness of our approach in comparison to state-of-the-art structure learning methods without confounders. Lastly, it demonstrates an application to Alzheimer's disease (AD), highlighting the utility of the method in constructing gene-to-gene and gene-to-disease regulatory networks involving Single Nucleotide Polymorphisms (SNPs) for healthy and AD subjects.
翻译:本文提出了一种适用于分析离散、连续及混合等多种类型结果数据的广义结构方程模型因果发现新方法。因果发现常因未测量混杂因素阻碍因果关系识别而面临挑战。该方法通过开发两种剥壳算法(自下而上和自上而下)确定因果关系与有效工具变量来解决此问题。方法首先重构超图表示变量间的祖先关系,该步骤采用基于节点广义线性模型回归的剥壳算法,利用主变量与工具变量间的关系;随后通过另一种剥壳算法从祖先关系中估计亲子效应,同时借用父模型信息消除子模型中的混杂因素。本文对所提方法进行了理论分析,确立了模型可识别性条件,并提供了剥壳算法准确发现亲子关系的统计保障。此外,数值实验展示了该方法与无混杂因素的最先进结构学习方法相比的有效性。最后,通过阿尔茨海默病(AD)应用案例,该方法在构建健康与AD受试者涉及单核苷酸多态性(SNPs)的基因间及基因-疾病调控网络中的应用价值得到验证。