MaFeRw: Query Rewriting with Multi-Aspect Feedbacks for Retrieval-Augmented Large Language Models

In a real-world RAG system, the current query often involves spoken ellipses and ambiguous references from dialogue contexts, necessitating query rewriting to better describe user's information needs. However, traditional context-based rewriting has minimal enhancement on downstream generation tasks due to the lengthy process from query rewriting to response generation. Some researchers try to utilize reinforcement learning with generation feedback to assist the rewriter, but these sparse rewards provide little guidance in most cases, leading to unstable training and generation results. We find that user's needs are also reflected in the gold document, retrieved documents and ground truth. Therefore, by feeding back these multi-aspect dense rewards to query rewriting, more stable and satisfactory responses can be achieved. In this paper, we propose a novel query rewriting method MaFeRw, which improves RAG performance by integrating multi-aspect feedback from both the retrieval process and generated results. Specifically, we first use manual data to train a T5 model for the rewriter initialization. Next, we design three metrics as reinforcement learning feedback: the similarity between the rewritten query and the gold document, the ranking metrics, and ROUGE between the generation and the ground truth. Inspired by RLAIF, we train three kinds of reward models for the above metrics to achieve more efficient training. Finally, we combine the scores of these reward models as feedback, and use PPO algorithm to explore the optimal query rewriting strategy. Experimental results on two conversational RAG datasets demonstrate that MaFeRw achieves superior generation metrics and more stable training compared to baselines.

翻译：在现实世界的检索增强生成（RAG）系统中，当前查询常包含口语化省略和对话上下文中的模糊指代，因此需要进行查询重写以更准确地描述用户的信息需求。然而，传统的基于上下文的重写方法由于从查询重写到响应生成的流程较长，对下游生成任务的提升有限。部分研究者尝试利用强化学习结合生成反馈来辅助重写器，但这些稀疏奖励在多数情况下提供的指导不足，导致训练不稳定且生成结果不佳。我们发现，用户需求同样反映在标准文档、检索文档和真实答案中。因此，通过将来自这些多维度的密集反馈应用于查询重写，可以获得更稳定且令人满意的响应。本文提出了一种新颖的查询重写方法 MaFeRw，该方法通过整合来自检索过程和生成结果的多维度反馈来提升 RAG 性能。具体而言，我们首先使用人工标注数据训练一个 T5 模型以初始化重写器。接着，我们设计了三个指标作为强化学习反馈：重写后查询与标准文档的相似度、检索排序指标以及生成结果与真实答案之间的 ROUGE 分数。受 RLAIF 启发，我们针对上述指标训练了三种奖励模型以实现更高效的训练。最后，我们将这些奖励模型的评分组合作为反馈，并利用 PPO 算法探索最优的查询重写策略。在两个对话式 RAG 数据集上的实验结果表明，与基线方法相比，MaFeRw 在生成指标上表现更优，且训练过程更为稳定。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日