Complex diseases are caused by a multitude of factors that may differ between patients even within the same diagnostic category. A few underlying root causes may nevertheless initiate the development of disease within each patient. We therefore focus on identifying patient-specific root causes of disease, which we equate to the sample-specific predictivity of the exogenous error terms in a structural equation model. We generalize from the linear setting to the heteroscedastic noise model where $Y = m(X) + \varepsilon\sigma(X)$ with non-linear functions $m(X)$ and $\sigma(X)$ representing the conditional mean and mean absolute deviation, respectively. This model preserves identifiability but introduces non-trivial challenges that require a customized algorithm called Generalized Root Causal Inference (GRCI) to extract the error terms correctly. GRCI recovers patient-specific root causes more accurately than existing alternatives.
翻译:复杂疾病由多种因素引起,这些因素即使在相同诊断类别内的患者之间也可能存在差异。然而,每位患者体内可能存在少数潜在病因触发疾病的发展。因此,我们专注于识别患者特异性的疾病病因,这等同于结构方程模型中外生误差项的样本特异性预测能力。我们将线性设置推广到异方差噪声模型,其中 $Y = m(X) + \varepsilon\sigma(X)$,非线性函数 $m(X)$ 和 $\sigma(X)$ 分别表示条件均值和平均绝对偏差。该模型保持了可识别性,但引入了需要定制算法才能正确提取误差项的非平凡挑战,该算法称为广义根因果推断(GRCI)。与现有替代方法相比,GRCI能够更准确地恢复患者特异性的病因。