Prototype-based interpretability methods provide intuitive explanations of model prediction by comparing samples to a reference set of memorized exemplars or typical representatives in terms of similarity. In the field of sequential data modeling, similarity calculations of prototypes are usually based on encoded representation vectors. However, due to highly recursive functions, there is usually a non-negligible disparity between the prototype-based explanations and the original input. In this work, we propose a Self-Explaining Selective Model (SESM) that uses a linear combination of prototypical concepts to explain its own predictions. The model employs the idea of case-based reasoning by selecting sub-sequences of the input that mostly activate different concepts as prototypical parts, which users can compare to sub-sequences selected from different example inputs to understand model decisions. For better interpretability, we design multiple constraints including diversity, stability, and locality as training objectives. Extensive experiments in different domains demonstrate that our method exhibits promising interpretability and competitive accuracy.
翻译:基于原型(prototype)的可解释性方法通过计算样本与记忆示例或典型代表这一参考集之间的相似性,提供模型预测的直观解释。在序列数据建模领域,原型的相似性计算通常基于编码后的表示向量。然而,由于高度递归函数的存在,基于原型的解释与原始输入之间通常存在不可忽视的差异。本文提出一种自解释选择性模型(Self-Explaining Selective Model, SESM),该模型通过原型概念的线性组合来解释自身预测。模型采用基于案例推理的思想,选择输入中能最大程度激活不同概念的子序列作为原型部分,用户可通过将这些子序列与不同示例输入中选取的子序列进行比较,从而理解模型决策。为提升可解释性,我们设计了包括多样性(diversity)、稳定性(stability)和局部性(locality)在内的多重约束作为训练目标。跨不同领域的大量实验表明,本方法展现出良好的可解释性和具有竞争力的准确性。