MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents

Recognizing if LLM output can be grounded in evidence is central to many tasks in NLP: retrieval-augmented generation, summarization, document-grounded dialogue, and more. Current approaches to this kind of fact-checking are based on verifying each piece of a model generation against potential evidence using an LLM. However, this process can be very computationally expensive, requiring many calls to a model to check a single response. In this work, we show how to build small fact-checking models that have GPT-4-level performance but for 400x lower cost. We do this by constructing synthetic training data with GPT-4, which involves creating realistic yet challenging instances of factual errors via a structured generation procedure. Training on this data teaches models to check each fact in the claim and recognize synthesis of information across sentences. For evaluation, we unify datasets from recent work on fact-checking and grounding LLM generations into a new benchmark, LLM-AggreFact. Our best system MiniCheck-FT5 (770M parameters) outperforms all systems of comparable size and reaches GPT-4 accuracy. We release LLM-AggreFact, code for data synthesis, and models.

翻译：识别大语言模型输出是否能够基于证据支撑，是自然语言处理中诸多任务的核心：检索增强生成、文本摘要、文档支撑对话等。当前针对此类事实核查的方法，通常基于大语言模型将模型生成的每个片段与潜在证据进行验证。然而，这一过程计算成本极高，核查单个响应往往需要多次调用模型。本研究展示了如何构建小型事实核查模型，在保持GPT-4级别性能的同时将成本降低400倍。我们通过GPT-4构建合成训练数据实现这一目标，该过程采用结构化生成方法创建真实且具有挑战性的事实错误实例。基于此数据训练模型，使其能够核查主张中的每个事实，并识别跨句子的信息综合。为进行评估，我们将近期关于大语言模型生成内容事实核查与证据支撑研究的数据集整合为新的基准测试LLM-AggreFact。我们提出的最佳系统MiniCheck-FT5（7.7亿参数）在同等规模系统中表现最优，并达到GPT-4的准确率。我们公开了LLM-AggreFact基准、数据合成代码及模型资源。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日