Recently, SHapley Additive exPlanations (SHAP) has been widely utilized in various research domains. This is particularly evident in application fields, where SHAP analysis serves as a crucial tool for identifying biomarkers and assisting in result validation. However, despite its frequent usage, SHAP is often not applied in a manner that maximizes its potential contributions. A review of recent papers employing SHAP reveals that many studies subjectively select a limited number of features as 'important' and analyze SHAP values by approximately observing plots without assessing statistical significance. Such superficial application may hinder meaningful contributions to the applied fields. To address this, we propose a library package designed to simplify the interpretation of SHAP values. By simply inputting the original data and SHAP values, our library provides: 1) the number of important features to analyze, 2) the pattern of each feature via univariate analysis, and 3) the interaction between features. All information is extracted based on its statistical significance and presented in simple, comprehensible sentences, enabling users of all levels to understand the interpretations. We hope this library fosters a comprehensive understanding of statistically valid SHAP results.
翻译:近年来,SHapley可加性解释(SHAP)已在众多研究领域得到广泛应用。尤其在应用科学领域,SHAP分析已成为识别生物标志物和辅助结果验证的关键工具。然而,尽管使用频繁,SHAP的潜在贡献往往未能得到充分发挥。对近期采用SHAP的论文进行梳理发现,许多研究主观地选择有限特征作为“重要”特征,并通过粗略观察图表来分析SHAP值,而未评估统计显著性。这种表面化的应用方式可能阻碍对应用领域作出实质性贡献。为此,我们提出一个旨在简化SHAP值解释的库工具包。仅需输入原始数据和SHAP值,本库即可提供:1)待分析重要特征的数量,2)通过单变量分析获得的各特征模式,以及3)特征间的交互作用。所有信息均基于统计显著性提取,并以简洁易懂的语句呈现,使不同水平的用户都能理解其解释。我们希望该库能促进对统计有效SHAP结果的全面理解。