In explainable machine learning, global feature importance methods try to determine how much each individual feature contributes to predicting the target variable, resulting in one importance score for each feature. But often, predicting the target variable requires interactions between several features (such as in the XOR function), and features might have complex statistical dependencies that allow to partially replace one feature with another one. In commonly used feature importance scores these cooperative effects are conflated with the features' individual contributions, making them prone to misinterpretations. In this work, we derive DIP, a new mathematical decomposition of individual feature importance scores that disentangles three components: the standalone contribution and the contributions stemming from interactions and dependencies. We prove that the DIP decomposition is unique and show how it can be estimated in practice. Based on these results, we propose a new visualization of feature importance scores that clearly illustrates the different contributions.
翻译:在可解释机器学习中,全局特征重要性方法试图确定每个独立特征对预测目标变量的贡献程度,从而为每个特征生成一个重要性分数。然而,预测目标变量通常需要多个特征之间的交互作用(例如XOR函数),且特征之间可能存在复杂的统计依赖关系,使得一个特征可以部分替代另一个特征。在常用的特征重要性分数中,这些协同效应与特征的独立贡献被混为一谈,容易导致误解。本研究推导出DIP——一种新的个体特征重要性分数数学分解方法,能够解耦三个组成部分:独立贡献、交互作用产生的贡献以及依赖关系产生的贡献。我们证明了DIP分解具有唯一性,并展示了其在实际中的估计方法。基于这些结果,我们提出了一种新的特征重要性分数可视化方案,能够清晰展示不同贡献成分。