TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models

Factual consistency evaluation is often conducted using Natural Language Inference (NLI) models, yet these models exhibit limited success in evaluating summaries. Previous work improved such models with synthetic training data. However, the data is typically based on perturbed human-written summaries, which often differ in their characteristics from real model-generated summaries and have limited coverage of possible factual errors. Alternatively, large language models (LLMs) have recently shown promising results in directly evaluating generative tasks, but are too computationally expensive for practical use. Motivated by these limitations, we introduce TrueTeacher, a method for generating synthetic data by annotating diverse model-generated summaries using a LLM. Unlike prior work, TrueTeacher does not rely on human-written summaries, and is multilingual by nature. Experiments on the TRUE benchmark show that a student model trained using our data, substantially outperforms both the state-of-the-art model with similar capacity, and the LLM teacher. In a systematic study, we compare TrueTeacher to existing synthetic data generation methods and demonstrate its superiority and robustness to domain-shift. Using the the mFACE dataset, we also show that our method generalizes to multilingual scenarios. Finally, we release a large-scale synthetic dataset with 1.4M examples generated using TrueTeacher.

翻译：事实一致性评估通常借助自然语言推理（NLI）模型进行，但这些模型在评估摘要时表现有限。以往研究通过合成训练数据改进了此类模型，但这类数据通常基于人工编写摘要的扰动版本，其特征往往与真实模型生成的摘要不同，且对可能的事实错误覆盖范围有限。此外，大语言模型（LLM）近来在直接评估生成式任务方面展现出前景，但其计算成本过高，难以实际应用。针对这些局限性，我们提出TrueTeacher方法——通过利用大语言模型对多样化的模型生成摘要进行标注，从而生成合成数据。与先前工作不同，TrueTeacher不依赖人工编写摘要，且天然支持多语言。在TRUE基准测试上的实验表明，使用我们的数据训练的“学生”模型，其性能显著优于同等规模的最先进模型以及大语言模型“教师”。通过系统性研究，我们将TrueTeacher与现有合成数据生成方法进行对比，证明了其优越性及对领域迁移的鲁棒性。利用mFACE数据集，我们还展示了该方法在多语言场景下的泛化能力。最后，我们发布了一个使用TrueTeacher生成的大规模合成数据集，包含140万条样本。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

46+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日