TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models

Factual consistency evaluation is often conducted using Natural Language Inference (NLI) models, yet these models exhibit limited success in evaluating summaries. Previous work improved such models with synthetic training data. However, the data is typically based on perturbed human-written summaries, which often differ in their characteristics from real model-generated summaries and have limited coverage of possible factual errors. Alternatively, large language models (LLMs) have recently shown promising results in directly evaluating generative tasks, but are too computationally expensive for practical use. Motivated by these limitations, we introduce TrueTeacher, a method for generating synthetic data by annotating diverse model-generated summaries using a LLM. Unlike prior work, TrueTeacher does not rely on human-written summaries, and is multilingual by nature. Experiments on the TRUE benchmark show that a student model trained using our data, substantially outperforms both the state-of-the-art model with similar capacity, and the LLM teacher. In a systematic study, we compare TrueTeacher to existing synthetic data generation methods and demonstrate its superiority and robustness to domain-shift. We also show that our method generalizes to multilingual scenarios. Lastly, we release our large scale synthetic dataset (1.4M examples), generated using TrueTeacher, and a checkpoint trained on this data.

翻译：事实一致性评估通常使用自然语言推理（NLI）模型进行，但这些模型在评估摘要时成效有限。先前研究通过合成训练数据改进了此类模型，然而这些数据通常基于对人工撰写摘要进行扰动生成，其特性往往与真实模型生成的摘要存在差异，且对可能的事实错误覆盖范围有限。此外，大语言模型（LLMs）近期在直接评估生成式任务方面展现出可喜成果，但其计算成本过高，难以实际应用。针对这些局限性，我们提出TrueTeacher方法，通过利用LLM标注多样化的模型生成摘要来生成合成数据。与先前工作不同，TrueTeacher不依赖人工撰写摘要，且天然支持多语言场景。在TRUE基准测试上的实验表明，使用我们的数据训练的Student模型，在性能上显著优于同等规模的最先进模型及LLM教师模型。通过系统性研究，我们比较了TrueTeacher与现有合成数据生成方法，证明了其在域迁移场景中的优越性和鲁棒性。此外，我们的方法能泛化至多语言场景。最后，我们发布了利用TrueTeacher生成的大规模合成数据集（140万条样本）及在此数据上训练的模型检查点。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日