Root causes of disease intuitively correspond to root vertices that increase the likelihood of a diagnosis. This description of a root cause nevertheless lacks the rigorous mathematical formulation needed for the development of computer algorithms designed to automatically detect root causes from data. Prior work defined patient-specific root causes of disease using an interventionalist account that only climbs to the second rung of Pearl's Ladder of Causation. In this theoretical piece, we climb to the third rung by proposing a counterfactual definition matching clinical intuition based on fixed factual data alone. We then show how to assign a root causal contribution score to each variable using Shapley values from explainable artificial intelligence. The proposed counterfactual formulation of patient-specific root causes of disease accounts for noisy labels, adapts to disease prevalence and admits fast computation without the need for counterfactual simulation.
翻译:疾病根本原因直观上对应于增加诊断可能性的根顶点。然而,这种对根本原因的描述缺乏严格的数学表述,而这是为了从数据中自动检测根本原因而设计的计算机算法所必需的。先前的工作使用一种干预性解释定义了患者特异性的疾病根本原因,该解释仅攀升至珀尔因果关系阶梯的第二级。在这篇理论性文章中,我们攀升至第三级,提出了一种仅基于固定事实数据且符合临床直觉的反事实定义。然后,我们展示了如何使用可解释人工智能中的夏普利值为每个变量分配一个根本因果贡献分数。所提出的患者特异性疾病根本原因的反事实表述考虑了噪声标签,适应疾病流行率,并且无需反事实模拟即可实现快速计算。