The rapid advancement and widespread adoption of machine learning-driven technologies have underscored the practical and ethical need for creating interpretable artificial intelligence systems. Feature importance, a method that assigns scores to the contribution of individual features on prediction outcomes, seeks to bridge this gap as a tool for enhancing human comprehension of these systems. Feature importance serves as an explanation of predictions in diverse contexts, whether by providing a global interpretation of a phenomenon across the entire dataset or by offering a localized explanation for the outcome of a specific data point. Furthermore, feature importance is being used both for explaining models and for identifying plausible causal relations in the data, independently from the model. However, it is worth noting that these various contexts have traditionally been explored in isolation, with limited theoretical foundations. This paper presents an axiomatic framework designed to establish coherent relationships among the different contexts of feature importance scores. Notably, our work unveils a surprising conclusion: when we combine the proposed properties with those previously outlined in the literature, we demonstrate the existence of an inconsistency. This inconsistency highlights that certain essential properties of feature importance scores cannot coexist harmoniously within a single framework.
翻译:机器学习驱动技术的快速进步和广泛采用,凸显了创建可解释人工智能系统的实际和伦理需求。特征重要性作为一种为单个特征对预测结果的贡献分配评分的方法,旨在通过增强人类对这些系统的理解来弥合这一差距。特征重要性在不同场景中用作预测的解释,无论是提供对整个数据集中某个现象的全局解释,还是为特定数据点的结果提供局部解释。此外,特征重要性既用于解释模型,也用于识别数据中独立于模型的合理因果关系。然而,值得注意的是,这些不同的场景传统上被孤立地研究,缺乏理论基础。本文提出了一套公理框架,旨在建立特征重要性评分不同场景之间的连贯关系。令人瞩目的是,我们的工作揭示了一个令人惊讶的结论:当我们将所提出的属性与文献中先前概述的属性结合时,我们证明了存在一种不一致性。这种不一致性凸显了特征重要性评分的某些基本属性无法在同一框架内和谐共存。