The minimal feature removal problem in the post-hoc explanation area aims to identify the minimal feature set (MFS). Prior studies using the greedy algorithm to calculate the minimal feature set lack the exploration of feature interactions under a monotonic assumption which cannot be satisfied in general scenarios. In order to address the above limitations, we propose a Cooperative Integrated Dynamic Refining method (CIDR) to efficiently discover minimal feature sets. Specifically, we design Cooperative Integrated Gradients (CIG) to detect interactions between features. By incorporating CIG and characteristics of the minimal feature set, we transform the minimal feature removal problem into a knapsack problem. Additionally, we devise an auxiliary Minimal Feature Refinement algorithm to determine the minimal feature set from numerous candidate sets. To the best of our knowledge, our work is the first to address the minimal feature removal problem in the field of natural language processing. Extensive experiments demonstrate that CIDR is capable of tracing representative minimal feature sets with improved interpretability across various models and datasets.
翻译:事后解释领域中的最小特征移除问题旨在识别最小特征集(MFS)。现有研究采用贪心算法计算最小特征集,但在单调性假设下缺乏对特征交互的探索,而该假设在一般场景中无法成立。为克服上述局限性,本文提出一种合作式集成动态精化方法(CIDR),以高效发现最小特征集。具体而言,我们设计了合作式集成梯度(CIG)用于检测特征间的交互作用。通过融合CIG与最小特征集的特性,我们将最小特征移除问题转化为一个背包问题。此外,我们设计了一种辅助性最小特征精化算法,用于从大量候选集中确定最小特征集。据我们所知,本文首次在自然语言处理领域解决最小特征移除问题。大量实验表明,CIDR能够追踪具有代表性的最小特征集,并在多种模型与数据集上提升可解释性。