The integration of retrieved passages and large language models (LLMs), such as ChatGPTs, has significantly contributed to improving open-domain question answering. However, there is still a lack of exploration regarding the optimal approach for incorporating retrieved passages into the answer generation process. This paper aims to fill this gap by investigating different methods of combining retrieved passages with LLMs to enhance answer generation. We begin by examining the limitations of a commonly-used concatenation approach. Surprisingly, this approach often results in generating "unknown" outputs, even when the correct document is among the top-k retrieved passages. To address this issue, we explore four alternative strategies for integrating the retrieved passages with the LLMs. These strategies include two single-round methods that utilize chain-of-thought reasoning and two multi-round strategies that incorporate feedback loops. Through comprehensive analyses and experiments, we provide insightful observations on how to effectively leverage retrieved passages to enhance the answer generation capability of LLMs.
翻译:检索段落与大型语言模型(如ChatGPTs)的整合显著提升了开放域问答的性能。然而,关于如何将检索到的段落最优地融入答案生成过程,目前仍缺乏深入研究。本文旨在填补这一空白,通过研究不同的检索段落与大型语言模型的结合方法,以增强答案生成能力。我们首先探讨了常用拼接方法的局限性。令人惊讶的是,即使正确文档位于前k个检索结果中,该方法也常导致生成“未知”输出。为解决此问题,我们探索了四种替代策略来整合检索段落与大型语言模型,包括两种利用思维链推理的单轮方法和两种融入反馈循环的多轮策略。通过全面的分析与实验,我们提供了关于如何有效利用检索段落增强大型语言模型答案生成能力的深刻见解。