Feature-based explanation methods aim to quantify how features influence the model's behavior, either locally or globally, but different methods often disagree, producing conflicting explanations. This disagreement arises primarily from two sources: how feature interactions are handled and how feature dependencies are incorporated. We propose GRANITE, a generalized regional explanation framework that partitions the feature space into regions where interaction and distribution influences are minimized. This approach aligns different explanation methods, yielding more consistent and interpretable explanations. GRANITE unifies existing regional approaches, extends them to feature groups, and introduces a recursive partitioning algorithm to estimate such regions. We demonstrate its effectiveness on real-world datasets, providing a practical tool for consistent and interpretable feature explanations.
翻译:基于特征的解释方法旨在量化特征如何局部或全局地影响模型行为,但不同方法常存在分歧,产生相互矛盾的解释。这种分歧主要源于两个因素:特征交互的处理方式以及特征依赖关系的纳入方式。我们提出GRANITE——一种广义区域解释框架,它将特征空间划分为交互作用与分布影响最小化的区域。该方法能协调不同解释方法,从而产生更一致且可解释的解释结果。GRANITE统一了现有区域方法,将其扩展至特征组,并引入递归分区算法来估计此类区域。我们在真实数据集上验证了其有效性,为获得一致且可解释的特征解释提供了实用工具。