Prompt Generate Train (PGT): Few-shot Domain Adaption of Retrieval Augmented Generation Models for Open Book Question-Answering

We propose a framework - Prompt, Generate, Train (PGT) - to efficiently develop a generative question-answering model for open-book question-answering over a proprietary collection of text documents. The framework adapts a retriever augmented generation (RAG) model to the target domain using supervised fine-tuning and reinforcement learning with synthetic feedback in a few-shot setting. This, we hypothesize, will yield an aligned, uncertainty calibrated model that is competitive with GPT-4 based in-context retrieval augmented generation in generating relevant answers at lower serving costs. The framework's synthetic generation pipeline will generate synthetic training data comprising <passage, question, answer> tuples using an open-source LLM and a novel consistency filtering scheme. The pipeline will be designed to generate both abstractive and extractive questions that span the entire corpus. The framework proposes to fine-tune a smaller RAG model comprising a dense retriever (ColBERTv2) and a smaller sized LLM on the synthetic dataset. In parallel, the framework will train a Reward model to score domain grounded answers higher than hallucinated answers using an a priori relevance ordering of synthetically assembled samples. In the next phase, the framework will align the RAG model with the target domain using reinforcement learning (Proximal Policy Optimization). This step may improve the RAG model's ability to generate grounded answers and ignore out of domain questions. In the final phase, the framework will calibrate the model's uncertainty for extractive question-answers.

翻译：我们提出一个框架——提示、生成、训练（PGT）——用于高效开发针对专有文本文档集合的生成式开放书籍问答模型。该框架通过有监督微调和基于合成反馈的强化学习，在少样本设置下将检索增强生成（RAG）模型适配至目标领域。我们假设，这将产生一个对齐的、不确定性校准的模型，其生成相关答案的能力可与基于GPT-4的上下文检索增强生成相媲美，且服务成本更低。框架的合成生成流水线将利用开源大语言模型和新型一致性过滤方案，生成由<段落、问题、答案>三元组构成的合成训练数据。该流水线被设计为可生成覆盖整个语料库的抽象型与抽取型问题。框架提出对包含密集检索器（ColBERTv2）和小型LLM的轻量级RAG模型进行合成数据集微调。同时，框架将训练一个奖励模型，利用预先生成的合成样本先验相关性排序，对基于领域的有效答案给予高于幻觉答案的评分。下一阶段，框架将使用强化学习（近端策略优化）使RAG模型与目标领域对齐。此步骤可提升RAG模型生成基于领域答案的能力，同时忽略领域外问题。最终阶段，框架将对模型的抽取型问答进行不确定性校准。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日