Large language models (LLMs), such as ChatGPT, are able to generate human-like, fluent responses for many downstream tasks, e.g., task-oriented dialog and question answering. However, applying LLMs to real-world, mission-critical applications remains challenging mainly due to their tendency to generate hallucinations and inability to use external knowledge.This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules. Our system makes the LLM generate responses grounded in consolidated external knowledge, e.g., stored in task-specific databases. It also iteratively revises LLM prompts to improve model responses using feedback generated by utility functions, e.g., the factuality score of a LLM-generated response. The effectiveness of LLM-Augmenter is empirically validated on two types of mission-critical scenarios, task-oriented dialog and open-domain question answering. LLM-Augmenter significantly reduces ChatGPT's hallucinations without sacrificing the fluency and informativeness of its responses. We make the source code and models publicly available.
翻译:大型语言模型(LLMs),如ChatGPT,能够为许多下游任务(例如任务导向型对话和问答)生成类似人类语言的流畅回复。然而,将LLMs应用于真实世界的关键任务场景仍面临挑战,主要因其易产生幻觉且无法利用外部知识。本文提出LLM-Augmenter系统,该系统通过一组即插即用模块增强黑箱LLM。我们的系统使LLM生成的回复基于整合后的外部知识(例如存储在特定任务数据库中的信息)。该系统还利用效用函数(如LLM生成回复的事实准确性得分)生成的反馈,迭代优化LLM提示以改进模型回复。LLM-Augmenter的有效性在两类关键任务场景(任务导向型对话与开放域问答)中得到实证验证。LLM-Augmenter显著减少了ChatGPT的幻觉,同时未牺牲其回复的流畅性与信息量。我们已公开提供源代码与模型。