The machine learning modeling process conventionally culminates in selecting a single model that maximizes a selected performance metric. However, this approach leads to abandoning a more profound analysis of slightly inferior models. Particularly in medical and healthcare studies, where the objective extends beyond predictions to valuable insight generation, relying solely on performance metrics can result in misleading or incomplete conclusions. This problem is particularly pertinent when dealing with a set of models with performance close to maximum one, known as $\textit{Rashomon set}$. Such a set can be numerous and may contain models describing the data in a different way, which calls for comprehensive analysis. This paper introduces a novel process to explore Rashomon set models, extending the conventional modeling approach. The cornerstone is the identification of the most different models within the Rashomon set, facilitated by the introduced $\texttt{Rashomon_DETECT}$ algorithm. This algorithm compares profiles illustrating prediction dependencies on variable values generated by eXplainable Artificial Intelligence (XAI) techniques. To quantify differences in variable effects among models, we introduce the Profile Disparity Index (PDI) based on measures from functional data analysis. To illustrate the effectiveness of our approach, we showcase its application in predicting survival among hemophagocytic lymphohistiocytosis (HLH) patients - a foundational case study. Additionally, we benchmark our approach on other medical data sets, demonstrating its versatility and utility in various contexts.
翻译:机器学习建模过程通常以选择最大化某个选定性能指标的单一模型而告终。然而,这种方法导致放弃对略逊色模型的更深入分析。尤其在医学和医疗健康研究中,目标不仅限于预测,还涉及产生有价值的见解,仅依赖性能指标可能导致误导性或片面的结论。当处理一组性能接近最优的模型(称为$\textit{Rashomon集}$)时,这一问题尤为突出。该集合可能规模庞大,且包含以不同方式描述数据的模型,因此需要进行全面分析。本文引入了一种新颖的过程来探索Rashomon集模型,扩展了传统的建模方法。其核心是通过引入的$\texttt{Rashomon_DETECT}$算法识别Rashomon集中差异最大的模型。该算法比较由可解释人工智能(XAI)技术生成的、展示预测对变量值依赖关系的剖面图。为量化模型中变量效应之间的差异,我们基于函数数据分析的度量引入了剖面差异指数(PDI)。为说明我们方法的有效性,我们展示了其在预测噬血细胞性淋巴组织细胞增多症(HLH)患者生存率中的应用——这是一个基础案例研究。此外,我们还在其他医学数据集上对我们的方法进行了基准测试,展示了其在不同场景下的通用性和实用性。