Exploration of the Rashomon Set Assists Trustworthy Explanations for Medical Data

from arxiv, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

The machine learning modeling process conventionally culminates in selecting a single model that maximizes a selected performance metric. However, this approach leads to abandoning a more profound analysis of slightly inferior models. Particularly in medical and healthcare studies, where the objective extends beyond predictions to valuable insight generation, relying solely on a single model can result in misleading or incomplete conclusions. This problem is particularly pertinent when dealing with a set of models known as $\textit{Rashomon set}$, with performance close to maximum one. Such a set can be numerous and may contain models describing the data in a different way, which calls for comprehensive analysis. This paper introduces a novel process to explore models in the Rashomon set, extending the conventional modeling approach. We propose the $\texttt{Rashomon_DETECT}$ algorithm to detect models with different behavior. It is based on recent developments in the eXplainable Artificial Intelligence (XAI) field. To quantify differences in variable effects among models, we introduce the Profile Disparity Index (PDI) based on measures from functional data analysis. To illustrate the effectiveness of our approach, we showcase its application in predicting survival among hemophagocytic lymphohistiocytosis (HLH) patients - a foundational case study. Additionally, we benchmark our approach on other medical data sets, demonstrating its versatility and utility in various contexts. If differently behaving models are detected in the Rashomon set, their combined analysis leads to more trustworthy conclusions, which is of vital importance for high-stakes applications such as medical applications.

翻译：机器学习建模过程通常以选择单一模型、最大化特定性能指标作为终点。然而，这种做法会放弃对性能略次模型的深入分析。尤其是在医疗健康研究中，当目标超越预测本身、转向生成有价值见解时，仅依赖单一模型可能导致误导性或不完整的结论。这一问题在处理被称为“拉什蒙集”（性能接近最优的一类模型集合）时尤为突出。该集合可能规模庞大，且包含以不同方式描述数据的模型，亟需全面分析。本文提出一种新型流程，用于探索拉什蒙集中的模型，从而拓展传统建模方法。我们设计了$\texttt{Rashomon_DETECT}$算法，基于可解释人工智能（XAI）领域的最新进展，来检测行为差异的模型。为量化模型间变量效应的差异，我们基于函数数据分析的度量方法，提出了剖面差异指数（PDI）。为验证方法有效性，我们以噬血细胞性淋巴组织细胞增多症（HLH）患者生存预测作为基础案例研究展示其应用。此外，我们还在其他医疗数据集上对方法进行基准测试，证明其在多种场景下的通用性与实用性。若在拉什蒙集中检测到行为不同的模型，对其进行联合分析将得出更可信的结论，这对医疗等高风险应用至关重要。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/