Counterfactual examples explain a prediction by highlighting changes of instance that flip the outcome of a classifier. This paper proposes TIGTEC, an efficient and modular method for generating sparse, plausible and diverse counterfactual explanations for textual data. TIGTEC is a text editing heuristic that targets and modifies words with high contribution using local feature importance. A new attention-based local feature importance is proposed. Counterfactual candidates are generated and assessed with a cost function integrating semantic distance, while the solution space is efficiently explored in a beam search fashion. The conducted experiments show the relevance of TIGTEC in terms of success rate, sparsity, diversity and plausibility. This method can be used in both model-specific or model-agnostic way, which makes it very convenient for generating counterfactual explanations.
翻译:反事实样本通过突出导致分类器输出翻转的实例变化来解释预测结果。本文提出TIGTEC——一种高效且模块化的方法,用于生成稀疏、合理且多样化的文本反事实解释。TIGTEC是一种文本编辑启发式算法,利用局部特征重要性定位并修改高贡献词元。我们提出了一种基于注意力机制的局部特征重要性计算方法。通过集成语义距离的成本函数生成并评估反事实候选集,同时采用束搜索策略高效探索解空间。实验证明,TIGTEC在成功率、稀疏性、多样性和合理性方面均表现优异。该方法既支持模型特定方式又支持模型无关方式,极大方便了反事实解释的生成。