Large language models (LLMs), such as ChatGPT, are able to generate human-like, fluent responses for many downstream tasks, e.g., task-oriented dialog and question answering. However, applying LLMs to real-world, mission-critical applications remains challenging mainly due to their tendency to generate hallucinations and inability to use external knowledge.This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules. Our system makes the LLM generate responses grounded in consolidated external knowledge, e.g., stored in task-specific databases. It also iteratively revises LLM prompts to improve model responses using feedback generated by utility functions, e.g., the factuality score of a LLM-generated response. The effectiveness of LLM-Augmenter is empirically validated on two types of mission-critical scenarios, task-oriented dialog and open-domain question answering. LLM-Augmenter significantly reduces ChatGPT's hallucinations without sacrificing the fluency and informativeness of its responses. We make the source code and models publicly available.
翻译:大型语言模型(LLMs),例如ChatGPT,能够为许多下游任务(如面向任务的对话和问答)生成类似人类且流畅的回复。然而,将LLMs应用于现实世界中的关键任务仍具挑战,主要因其易产生幻觉且无法使用外部知识。本文提出一个名为LLM-Augmenter的系统,该系统通过一组即插即用模块增强黑盒LLM。我们的系统使LLM能够生成基于整合外部知识(例如存储在特定任务数据库中的知识)的回复。它还利用效用函数(例如LLM生成回复的事实性得分)生成的反馈,迭代修订LLM提示以改进模型回复。LLM-Augmenter的有效性在两种关键任务场景——面向任务的对话和开放域问答——中得到了实证验证。LLM-Augmenter显著减少了ChatGPT的幻觉,同时未牺牲其回复的流畅性和信息丰富度。我们已公开源代码和模型。