By explaining how humans would solve a given task, human rationales can provide strong learning signal for neural language models (LMs). Explanation regularization (ER) aims to improve LM generalization by pushing the LM's machine rationales (Which input tokens did the LM focus on?) to align with human rationales (Which input tokens would humans focus on?). Though prior works primarily study ER via in-distribution (ID) evaluation, out-of-distribution (OOD) generalization is often more critical in real-world scenarios, yet ER's effect on OOD generalization has been underexplored. In this paper, we introduce ER-Test, a framework for evaluating ER models' OOD generalization along three dimensions: unseen dataset tests, contrast set tests, and functional tests. Using ER-Test, we extensively analyze how ER models' OOD generalization varies with different ER design choices. Across two tasks and six datasets, ER-Test shows that ER has little impact on ID performance but can yield large OOD performance gains. Also, we find that ER can improve OOD performance even with limited rationale supervision. ER-Test's results help demonstrate ER's utility and establish best practices for using ER effectively.
翻译:通过解释人类如何解决给定任务,人类理据可为神经语言模型提供强大的学习信号。解释正则化旨在通过推动语言模型的机器理据(即语言模型聚焦于哪些输入标记)与人类理据(即人类聚焦于哪些输入标记)对齐,从而提升语言模型的泛化能力。尽管现有研究主要通过分布内评估来研究解释正则化,但在现实场景中,分布外泛化通常更为关键,然而解释正则化对分布外泛化的影响尚未得到充分探索。本文提出ER-Test框架,从三个维度评估解释正则化模型的分布外泛化能力:未见数据集测试、对比集测试与功能测试。借助ER-Test,我们系统分析了不同解释正则化设计选择下模型分布外泛化能力的变化。在两个任务与六个数据集上的实验表明,解释正则化对分布内性能影响甚微,但能显著提升分布外性能。此外,我们发现即便在有限理据监督条件下,解释正则化仍能改善分布外性能。ER-Test的结果有助于论证解释正则化的效用,并为有效运用解释正则化确立最佳实践准则。