Root causes of disease intuitively correspond to root vertices that increase the likelihood of a diagnosis. This description of a root cause nevertheless lacks the rigorous mathematical formulation needed for the development of computer algorithms designed to automatically detect root causes from data. Prior work defined patient-specific root causes of disease using an interventionalist account that only climbs to the second rung of Pearl's Ladder of Causation. In this theoretical piece, we climb to the third rung by proposing a counterfactual definition matching clinical intuition based on fixed factual data alone. We then show how to assign a root causal contribution score to each variable using Shapley values from explainable artificial intelligence. The proposed counterfactual formulation of patient-specific root causes of disease accounts for noisy labels, adapts to disease prevalence and admits fast computation without the need for counterfactual simulation.
翻译:疾病的根本原因直观上对应于增加诊断可能性的根节点。然而,这种对根本原因的描述缺乏严谨的数学表述,而这正是开发自动从数据中检测根本原因的计算机算法所必需的。先前的工作采用干预主义方法定义了病人特异性疾病的根本原因,但该方法仅触及Pearl因果阶梯的第二级。在这篇理论性文章中,我们通过提出一种仅基于固定事实数据、符合临床直觉的反事实定义,攀登至因果阶梯的第三级。随后,我们展示了如何利用可解释人工智能中的Shapley值为每个变量分配根因果贡献分数。所提出的病人特异性疾病根本原因的反事实表述能够处理噪声标签、适应疾病流行率,并且无需反事实模拟即可实现快速计算。