In this growing age of data and technology, large black-box models are becoming the norm due to their ability to handle vast amounts of data and learn incredibly complex data patterns. The deficiency of these methods, however, is their inability to explain the prediction process, making them untrustworthy and their use precarious in high-stakes situations. SHapley Additive exPlanations (SHAP) analysis is an explainable AI method growing in popularity for its ability to explain model predictions in terms of the original features. For each sample and feature in the data set, an associated SHAP value quantifies the contribution of that feature to the prediction of that sample. Analysis of these SHAP values provides valuable insight into the model's decision-making process, which can be leveraged to create data-driven solutions. The interpretation of these SHAP values, however, is model-dependent, so there does not exist a universal analysis procedure. To aid in these efforts, we present a detailed investigation of SHAP analysis across various machine learning models and data sets. In uncovering the details and nuance behind SHAP analysis, we hope to empower analysts in this less-explored territory. We also present a novel generalization of the waterfall plot to the multi-classification problem.
翻译:在数据与科技快速发展的当下,大型黑箱模型因其处理海量数据和学习复杂数据模式的能力而成为常态。然而,这些方法的缺陷在于无法解释预测过程,这在高风险场景中使其不可信且难以安全使用。沙普利附加解释(SHAP)分析作为一种可解释人工智能方法,因能基于原始特征解释模型预测而日益流行。对于数据集中的每个样本和特征,其对应的SHAP值量化了该特征对该样本预测的贡献。通过分析这些SHAP值,可深入理解模型的决策过程,进而用于构建数据驱动的解决方案。然而,SHAP值的解读依赖于具体模型,因此不存在通用的分析流程。为助力相关研究,我们针对多种机器学习模型和数据集开展了详细的SHAP分析调查。通过揭示SHAP分析中的细节与微妙之处,我们期望帮助分析人员探索这一尚待深入研究的领域。此外,我们还提出了将瀑布图推广到多分类问题的新方法。