Explainable NLP techniques primarily explain by answering "Which tokens in the input are responsible for this prediction?''. We argue that for NLP models that make predictions by comparing two input texts, it is more useful to explain by answering "What differences between the two inputs explain this prediction?''. We introduce a technique to generate contrastive highlights that explain the predictions of a semantic divergence model via phrase-alignment-guided erasure. We show that the resulting highlights match human rationales of cross-lingual semantic differences better than popular post-hoc saliency techniques and that they successfully help people detect fine-grained meaning differences in human translations and critical machine translation errors.
翻译:可解释自然语言处理技术主要通过回答"输入中的哪些标记导致了这一预测"进行解释。我们提出,对于通过比较两段输入文本进行预测的NLP模型,回答"两个输入之间的哪些差异解释了这一预测"更有价值。本文引入一种生成对比高亮的技术,该技术通过短语对齐引导擦除来解释语义差异模型的预测。实验表明,与流行的后验显著性技术相比,生成的对比高亮更符合人类对跨语言语义差异的论证标准,并能成功帮助人们检测人工翻译中的细粒度意义差异及关键机器翻译错误。