Selective rationales and counterfactual examples have emerged as two effective, complementary classes of interpretability methods for analyzing and training NLP models. However, prior work has not explored how these methods can be integrated to combine their complementary advantages. We overcome this limitation by introducing CREST (ContRastive Edits with Sparse raTionalization), a joint framework for selective rationalization and counterfactual text generation, and show that this framework leads to improvements in counterfactual quality, model robustness, and interpretability. First, CREST generates valid counterfactuals that are more natural than those produced by previous methods, and subsequently can be used for data augmentation at scale, reducing the need for human-generated examples. Second, we introduce a new loss function that leverages CREST counterfactuals to regularize selective rationales and show that this regularization improves both model robustness and rationale quality, compared to methods that do not leverage CREST counterfactuals. Our results demonstrate that CREST successfully bridges the gap between selective rationales and counterfactual examples, addressing the limitations of existing methods and providing a more comprehensive view of a model's predictions.
翻译:选择性解释和反事实样例已成为分析和训练NLP模型的两种有效且互补的可解释性方法。然而,现有研究尚未探索如何整合这些方法以发挥其互补优势。我们通过提出CREST(基于稀疏选择性解释的对比编辑)——一个面向选择性解释与反事实文本生成的联合框架——克服了这一局限性,并表明该框架能提升反事实质量、模型鲁棒性和可解释性。首先,CREST生成的反事实样例比先前方法更自然,且可大规模用于数据增强,从而减少对人工生成样例的需求。其次,我们引入了一种新的损失函数,利用CREST反事实样例正则化选择性解释,并证明与未使用CREST反事实的方法相比,这种正则化同时提升了模型鲁棒性和解释质量。我们的实验结果表明,CREST成功弥合了选择性解释与反事实样例之间的鸿沟,弥补了现有方法的不足,并提供了对模型预测更全面的理解。