MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents

Recognizing if LLM output can be grounded in evidence is central to many tasks in NLP: retrieval-augmented generation, summarization, document-grounded dialogue, and more. Current approaches to this kind of "fact-checking" are based on verifying each piece of a model generation against potential evidence using an LLM. However, this process can be very computationally expensive, requiring many calls to LLMs to check a single response. In this work, we show how to build small models that have GPT-4-level performance but for 400x lower cost. We do this by constructing synthetic training data with GPT-4, which involves creating realistic yet challenging instances of factual errors via a structured generation procedure. Training on this data teaches models to check each fact in the claim and recognize synthesis of information across sentences. For evaluation, we unify pre-existing datasets into a benchmark LLM-AggreFact, collected from recent work on fact-checking and grounding LLM generations. Our best system MiniCheck-FT5 (770M parameters) outperforms all systems of comparable size and reaches GPT-4 accuracy. We release LLM-AggreFact, code for data synthesis, and models.

翻译：判断大语言模型输出是否能够基于证据支撑，是自然语言处理中众多任务（如检索增强生成、摘要、文档支撑对话等）的核心。当前针对此类“事实核查”的方法通常依赖大语言模型逐一验证模型生成内容中的每个片段与潜在证据的匹配性。然而，这一过程计算成本极高，需多次调用大语言模型才能完成单次响应的核查。本研究提出如何构建具备GPT-4级别性能但成本降低400倍的小型模型。我们通过GPT-4构造合成训练数据实现此目标：采用结构化生成流程创建既真实又具有挑战性的事实错误实例。基于此类数据的训练使模型能够核查主张中的每个事实，并识别跨句信息综合。为进行评测，我们将现有数据集整合为基准测试LLM-AggreFact，该基准源自近期关于大语言模型生成内容的事实核查与支撑研究。最佳系统MiniCheck-FT5（7.7亿参数）在所有同等规模系统中表现最优，达到GPT-4的准确率。我们已开源LLM-AggreFact数据集、数据合成代码及模型。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日