The minimal feature removal problem in the post-hoc explanation area aims to identify the minimal feature set (MFS). Prior studies using the greedy algorithm to calculate the minimal feature set lack the exploration of feature interactions under a monotonic assumption which cannot be satisfied in general scenarios. In order to address the above limitations, we propose a Cooperative Integrated Dynamic Refining method (CIDR) to efficiently discover minimal feature sets. Specifically, we design Cooperative Integrated Gradients (CIG) to detect interactions between features. By incorporating CIG and characteristics of the minimal feature set, we transform the minimal feature removal problem into a knapsack problem. Additionally, we devise an auxiliary Minimal Feature Refinement algorithm to determine the minimal feature set from numerous candidate sets. To the best of our knowledge, our work is the first to address the minimal feature removal problem in the field of natural language processing. Extensive experiments demonstrate that CIDR is capable of tracing representative minimal feature sets with improved interpretability across various models and datasets.
翻译:事后解释领域的最小特征移除问题旨在识别最小特征集。现有研究采用贪心算法计算最小特征集,但在单调性假设下缺乏对特征交互的探索,而该假设在一般场景中往往难以满足。为解决上述局限性,我们提出一种协作集成动态精炼方法(CIDR),以高效发现最小特征集。具体而言,我们设计了协作集成梯度(CIG)来检测特征间的交互。通过融合CIG与最小特征集的特性,我们将最小特征移除问题转化为背包问题。此外,我们开发了一种辅助的最小特征精炼算法,用于从大量候选集合中确定最小特征集。据我们所知,我们的工作是首次在自然语言处理领域解决最小特征移除问题。大量实验表明,CIDR能够追踪具有代表性的最小特征集,并在多种模型和数据集上提升可解释性。