We propose a framework - Prompt, Generate, Train (PGT) - to efficiently develop a generative question-answering model for open-book question-answering over a proprietary collection of text documents. The framework adapts a retriever augmented generation (RAG) model to the target domain using supervised fine-tuning and reinforcement learning with synthetic feedback in a few-shot setting. This, we hypothesize, will yield an aligned, uncertainty calibrated model that is competitive with GPT-4 based in-context retrieval augmented generation in generating relevant answers at lower serving costs. The framework's synthetic generation pipeline will generate synthetic training data comprising <passage, question, answer> tuples using an open-source LLM and a novel consistency filtering scheme. The pipeline will be designed to generate both abstractive and extractive questions that span the entire corpus. The framework proposes to fine-tune a smaller RAG model comprising a dense retriever (ColBERTv2) and a smaller sized LLM on the synthetic dataset. In parallel, the framework will train a Reward model to score domain grounded answers higher than hallucinated answers using an a priori relevance ordering of synthetically assembled samples. In the next phase, the framework will align the RAG model with the target domain using reinforcement learning (Proximal Policy Optimization). This step may improve the RAG model's ability to generate grounded answers and ignore out of domain questions. In the final phase, the framework will calibrate the model's uncertainty for extractive question-answers.
翻译:我们提出一个框架——提示、生成、训练(PGT)——用于高效开发针对专有文本文档集合的生成式开放书籍问答模型。该框架通过有监督微调和基于合成反馈的强化学习,在少样本设置下将检索增强生成(RAG)模型适配至目标领域。我们假设,这将产生一个对齐的、不确定性校准的模型,其生成相关答案的能力可与基于GPT-4的上下文检索增强生成相媲美,且服务成本更低。框架的合成生成流水线将利用开源大语言模型和新型一致性过滤方案,生成由<段落、问题、答案>三元组构成的合成训练数据。该流水线被设计为可生成覆盖整个语料库的抽象型与抽取型问题。框架提出对包含密集检索器(ColBERTv2)和小型LLM的轻量级RAG模型进行合成数据集微调。同时,框架将训练一个奖励模型,利用预先生成的合成样本先验相关性排序,对基于领域的有效答案给予高于幻觉答案的评分。下一阶段,框架将使用强化学习(近端策略优化)使RAG模型与目标领域对齐。此步骤可提升RAG模型生成基于领域答案的能力,同时忽略领域外问题。最终阶段,框架将对模型的抽取型问答进行不确定性校准。