Feature attribution aims to explain the reasoning behind a black-box model's prediction by identifying the impact of each feature on the prediction. Recent work has extended feature attribution to interactions between multiple features. However, the lack of a unified framework has led to a proliferation of methods that are often not directly comparable. This paper introduces a parameterized attribution framework -- the Weighted M\"obius Score -- and (i) shows that many different attribution methods for both individual features and feature interactions are special cases and (ii) identifies some new methods. By studying the vector space of attribution methods, our framework utilizes standard linear algebra tools and provides interpretations in various fields, including cooperative game theory and causal mediation analysis. We empirically demonstrate the framework's versatility and effectiveness by applying these attribution methods to feature interactions in sentiment analysis and chain-of-thought prompting.
翻译:特征归因旨在通过识别每个特征对预测结果的影响,来解释黑箱模型预测背后的推理过程。近期研究已将特征归因扩展至多特征之间的交互作用。然而,由于缺乏统一框架,导致众多方法层出不穷且难以直接比较。本文提出一种参数化归因框架——加权莫比乌斯评分(Weighted Möbius Score)——并(i)证明多种针对单个特征及特征交互的归因方法均为该框架的特例,(ii)识别出若干新方法。通过研究归因方法的向量空间,本框架利用标准线性代数工具,为合作博弈论、因果中介分析等多个领域提供解释。我们通过将这些归因方法应用于情感分析中的特征交互和思维链提示,实证证明了该框架的通用性与有效性。