The absence of transparency and explainability hinders the clinical adoption of Machine learning (ML) algorithms. Although various methods of explainable artificial intelligence (XAI) have been suggested, there is a lack of literature that delves into their practicality and assesses them based on criteria that could foster trust in clinical environments. To address this gap this study evaluates two popular XAI methods used for explaining predictive models in the healthcare context in terms of whether they (i) generate domain-appropriate representation, i.e. coherent with respect to the application task, (ii) impact clinical workflow and (iii) are consistent. To that end, explanations generated at the cohort and patient levels were analysed. The paper reports the first benchmarking of the XAI methods applied to risk prediction models obtained by evaluating the concordance between generated explanations and the trigger of a future clinical deterioration episode recorded by the data collection system. We carried out an analysis using two Electronic Medical Records (EMR) datasets sourced from Australian major hospitals. The findings underscore the limitations of state-of-the-art XAI methods in the clinical context and their potential benefits. We discuss these limitations and contribute to the theoretical development of trustworthy XAI solutions where clinical decision support guides the choice of intervention by suggesting the pattern or drivers for clinical deterioration in the future.
翻译:机器学习算法缺乏透明度和可解释性阻碍了其在临床中的应用。尽管已有多种可解释人工智能(XAI)方法被提出,但现有文献对其实际可行性的探讨不足,且缺乏基于临床信任准则的评估。为填补这一空白,本研究从以下三方面评估两种常用于医疗领域预测模型解释的XAI方法:(i)是否生成领域适应性表征,即与临床应用任务的一致性;(ii)是否影响临床工作流程;(iii)是否具有一致性。为此,我们分析了队列层面和患者层面的解释结果。本文首次通过评估生成解释与数据采集系统记录的未来临床恶化触发事件之间的吻合度,对应用于风险预测模型的XAI方法进行基准测试。我们使用来自澳大利亚主要医院的两个电子病历(EMR)数据集进行分析。研究结果揭示了当前主流XAI方法在临床环境中的局限性及其潜在价值。我们探讨了这些局限性,并推动了可信赖XAI方案的理论发展——通过揭示未来临床恶化的模式或驱动因素,使临床决策支持系统能够指导干预措施的选择。