Contrastive explanations, where one decision is explained in contrast to another, are supposed to be closer to how humans explain a decision than non-contrastive explanations, where the decision is not necessarily referenced to an alternative. This claim has never been empirically validated. We analyze four English text-classification datasets (SST2, DynaSent, BIOS and DBpedia-Animals). We fine-tune and extract explanations from three different models (RoBERTa, GTP-2, and T5), each in three different sizes and apply three post-hoc explainability methods (LRP, GradientxInput, GradNorm). We furthermore collect and release human rationale annotations for a subset of 100 samples from the BIOS dataset for contrastive and non-contrastive settings. A cross-comparison between model-based rationales and human annotations, both in contrastive and non-contrastive settings, yields a high agreement between the two settings for models as well as for humans. Moreover, model-based explanations computed in both settings align equally well with human rationales. Thus, we empirically find that humans do not necessarily explain in a contrastive manner.9 pages, long paper at ACL 2022 proceedings.
翻译:对比解释(contrastive explanations)通过将某一决策与另一决策进行对比来解释,相较于非对比解释(non-contrastive explanations,即决策不必然参照替代方案),被认为更接近人类解释决策的方式。这一主张此前从未得到实证验证。我们分析了四个英文文本分类数据集(SST2、DynaSent、BIOS和DBpedia-Animals),对三种不同规模的模型(RoBERTa、GPT-2和T5)进行微调并提取解释,同时应用三种事后可解释性方法(LRP、Gradient×Input、GradNorm)。此外,我们从BIOS数据集中收集并发布了100个样本的人类原因注释,涵盖对比与非对比两种设置。在对比与非对比两种设置下,模型生成的原因与人类注释的交叉比较显示,模型与人类在两种设置之间均具有高度一致性。此外,两种设置下计算的模型解释与人类原因同样契合良好。因此,我们通过实证发现,人类并不必然以对比方式进行解释。本文为9页长文,收录于ACL 2022会议论文集。