Large Language Models (LLMs) has shown exceptional capabilities in many natual language understanding and generation tasks. However, the personalization issue still remains a much-coveted property, especially when it comes to the multiple sources involved in the dialogue system. To better plan and incorporate the use of multiple sources in generating personalized response, we firstly decompose it into three sub-tasks: Knowledge Source Selection, Knowledge Retrieval, and Response Generation. We then propose a novel Unified Multi-Source Retrieval-Augmented Generation system (UniMS-RAG) Specifically, we unify these three sub-tasks with different formulations into the same sequence-to-sequence paradigm during the training, to adaptively retrieve evidences and evaluate the relevance on-demand using special tokens, called acting tokens and evaluation tokens. Enabling language models to generate acting tokens facilitates interaction with various knowledge sources, allowing them to adapt their behavior to diverse task requirements. Meanwhile, evaluation tokens gauge the relevance score between the dialogue context and the retrieved evidence. In addition, we carefully design a self-refinement mechanism to iteratively refine the generated response considering 1) the consistency scores between the generated response and retrieved evidence; and 2) the relevance scores. Experiments on two personalized datasets (DuLeMon and KBP) show that UniMS-RAG achieves state-of-the-art performance on the knowledge source selection and response generation task with itself as a retriever in a unified manner. Extensive analyses and discussions are provided for shedding some new perspectives for personalized dialogue systems.
翻译:大语言模型(LLMs)在众多自然语言理解与生成任务中展现出卓越能力。然而,个性化问题仍是备受关注的特性,尤其在涉及多源信息的对话系统中。为更好地规划并整合多源信息以生成个性化回复,我们首先将其分解为三个子任务:知识源选择、知识检索与回复生成。随后,我们提出一种新颖的统一多源检索增强生成系统(UniMS-RAG)。具体而言,我们在训练过程中将这三个具有不同形式的子任务统一至相同的序列到序列范式,通过特殊标记(称为执行标记与评估标记)实现自适应证据检索与按需相关性评估。使语言模型生成执行标记可促进其与各类知识源的交互,从而适应不同任务需求。同时,评估标记用于度量对话上下文与检索证据之间的相关性分数。此外,我们精心设计了自优化机制,通过迭代方式优化生成回复,该机制综合考虑:1)生成回复与检索证据间的一致性分数;2)相关性分数。在两个个性化数据集(DuLeMon与KBP)上的实验表明,UniMS-RAG以统一方式作为检索器,在知识源选择与回复生成任务上取得了最先进的性能。本文通过广泛分析与讨论,为个性化对话系统研究提供了新的视角。