Retrieval-Augmented Generation (RAG) has become a powerful paradigm for enhancing large language models (LLMs) through external knowledge retrieval. Despite its widespread attention, existing academic research predominantly focuses on single-turn RAG, leaving a significant gap in addressing the complexities of multi-turn conversations found in real-world applications. To bridge this gap, we introduce CORAL, a large-scale benchmark designed to assess RAG systems in realistic multi-turn conversational settings. CORAL includes diverse information-seeking conversations automatically derived from Wikipedia and tackles key challenges such as open-domain coverage, knowledge intensity, free-form responses, and topic shifts. It supports three core tasks of conversational RAG: passage retrieval, response generation, and citation labeling. We propose a unified framework to standardize various conversational RAG methods and conduct a comprehensive evaluation of these methods on CORAL, demonstrating substantial opportunities for improving existing approaches.
翻译:检索增强生成(RAG)已成为通过外部知识检索增强大语言模型(LLM)能力的重要范式。尽管受到广泛关注,现有学术研究主要集中于单轮RAG,在应对实际应用中多轮对话的复杂性方面存在显著空白。为填补这一空白,我们提出了CORAL——一个旨在评估现实多轮对话场景下RAG系统的大规模基准。CORAL包含从Wikipedia自动衍生的多样化信息寻求对话,并应对开放领域覆盖、知识密度、自由形式响应及话题转换等关键挑战。该基准支持对话式RAG的三个核心任务:段落检索、响应生成和引用标注。我们提出了统一框架以标准化各类对话式RAG方法,并在CORAL上对这些方法进行了全面评估,结果表明现有方法存在显著的改进空间。