Reformulating Domain Adaptation of Large Language Models as Adapt-Retrieve-Revise

While large language models (LLMs) like GPT-4 have recently demonstrated astonishing zero-shot capabilities in general domain tasks, they often generate content with hallucinations in specific domains such as Chinese law, hindering their application in these areas. This is typically due to the absence of training data that encompasses such a specific domain, preventing GPT-4 from acquiring in-domain knowledge. A pressing challenge is that it's not plausible to continue training LLMs of such scale on in-domain data. This paper introduces a simple and effective domain adaptation framework for GPT-4 by reformulating generation as an \textbf{adapt-retrieve-revise} process. The initial step is to \textbf{adapt} an affordable 7B LLM to the target domain by continuing learning on in-domain data. When solving a task, we leverage the adapted LLM to generate a draft answer given a task query. Then, the draft answer will be used to \textbf{retrieve} supporting evidence candidates from an external in-domain knowledge base. Finally, the draft answer and retrieved evidence are concatenated into a whole prompt to let GPT-4 assess the evidence and \textbf{revise} the draft answer to generate the final answer. Our proposal combines the advantages of the efficiency of adapting a smaller 7B model with the evidence-assessing capability of GPT-4 and effectively prevents GPT-4 from generating hallucinatory content. In the zero-shot setting of four Chinese legal tasks, our method improves accuracy by 33.3\% compared to the direct generation by GPT-4. When compared to two stronger retrieval-based baselines, our method outperforms them by 15.4\% and 23.9\%. Our code will be released

翻译：尽管以GPT-4为代表的大型语言模型在通用领域任务中展现了惊人的零样本能力，但在中国法律等特定领域常生成包含幻觉的内容，阻碍了其在这些领域的应用。这通常源于训练数据缺乏特定领域内容，导致GPT-4无法获得领域内知识。一个紧迫挑战是，对如此规模的大语言模型进行领域数据持续训练并不现实。本文通过将生成过程重构为\textbf{适应-检索-修正}流程，提出一种简单有效的GPT-4领域适应框架。第一步是对可负担的7B大语言模型在领域数据上进行持续学习，使其\textbf{适应}目标领域。执行任务时，我们利用适应后的模型根据任务查询生成草稿答案。随后，该草稿答案被用于从外部领域知识库中\textbf{检索}支持性证据候选。最后，将草稿答案与检索证据拼接为完整提示，让GPT-4评估证据并\textbf{修正}草稿答案生成最终结果。该方法结合了适应小规模7B模型的高效性与GPT-4的证据评估能力，有效防止GPT-4生成幻觉内容。在四个中国法律任务的零样本设置中，相比GPT-4直接生成，本方法准确率提升33.3%；与两种更强的基于检索的基线方法相比，分别超出15.4%和23.9%。相关代码将开源。

相关内容

GPT-4

关注 29

北京时间2023年3月15日凌晨，ChatGPT开发商OpenAI 发布了发布了全新的多模态预训练大模型 GPT-4，可以更可靠、更具创造力、能处理更细节的指令，根据图片和文字提示都能生成相应内容。具体来说来说，GPT-4 相比上一代的模型，实现了飞跃式提升：支持图像和文本输入，拥有强大的识图能力；大幅提升了文字输入限制，在ChatGPT模式下，GPT-4可以处理超过2.5万字的文本，可以处理一些更加细节的指令；回答准确性也得到了显著提高。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日