Multi-document grounded dialogue systems (DGDS) belong to a class of conversational agents that answer users' requests by finding supporting knowledge from a collection of documents. Most previous studies aim to improve the knowledge retrieval model or propose more effective ways to incorporate external knowledge into a parametric generation model. These methods, however, focus on retrieving knowledge from mono-granularity language units (e.g. passages, sentences, or spans in documents), which is not enough to effectively and efficiently capture precise knowledge in long documents. This paper proposes Re3G, which aims to optimize both coarse-grained knowledge retrieval and fine-grained knowledge extraction in a unified framework. Specifically, the former efficiently finds relevant passages in a retrieval-and-reranking process, whereas the latter effectively extracts finer-grain spans within those passages to incorporate into a parametric answer generation model (BART, T5). Experiments on DialDoc Shared Task demonstrate the effectiveness of our method.
翻译:多文档基础对话系统(DGDS)属于一类对话代理,通过从文档集合中寻找支持性知识来回答用户请求。以往研究大多旨在改进知识检索模型或提出更有效的方式将外部知识融入参数化生成模型。然而,这些方法侧重于从单一粒度的语言单元(如段落、句子或文档中的片段)检索知识,这不足以高效且有效地捕获长文档中的精确知识。本文提出Re3G,旨在统一框架内优化粗粒度知识检索和细粒度知识提取。具体来说,前者通过检索-重排序过程高效定位相关段落,而后者有效提取这些段落中的更细粒度片段,并将其融入参数化答案生成模型(BART、T5)中。在DialDoc Shared Task上的实验证明了我们方法的有效性。