Retrieval-augmented generation (RAG) can substantially enhance the performance of LLMs on knowledge-intensive tasks. Various RAG paradigms - including vanilla, planning-based, and iterative RAG - all depend on a robust retriever, yet existing retrievers rely heavily on public knowledge and often falter when faced with domain-specific queries. To address these limitations, we introduce DRAGON, a framework that combines a data-construction modeling approach with a scalable synthetic data-generation pipeline, specifically designed to optimize domain-specific retrieval performance and bolster retriever robustness. To evaluate RAG performance on domain-specific RAGs, we propose DRAGONBench, a benchmark spanning 8 domain-specific document collections across 4 distinct fields and featuring a wide spectrum of query complexities, answerability, and hop numbers. Leveraging DRAGON, we generate a large-scale synthetic dataset - encompassing both single-hop and multi-hop queries - to enrich retriever training. Extensive experiments demonstrate that retrievers trained on this data yield significant performance gains and exhibit strong cross-domain generalization. Moreover, when our optimized retrievers are integrated into vanilla, planning-based, and iterative RAG paradigms, we observe consistent end-to-end improvements in system accuracy.
翻译:检索增强生成(RAG)能够显著提升大语言模型在知识密集型任务上的性能。现有的多种RAG范式——包括基础型、规划型和迭代型RAG——均依赖于一个鲁棒的检索器,然而当前检索器主要基于公共知识构建,在面对领域特定查询时往往表现欠佳。为应对这些局限,我们提出了DRAGON框架,该框架结合了数据构建建模方法与可扩展的合成数据生成流程,专门用于优化领域特定检索性能并增强检索器的鲁棒性。为评估领域特定RAG的性能,我们构建了DRAGONBench基准测试集,涵盖4个不同领域的8个领域特定文档库,并包含复杂度、可答性及跳数各异的多样化查询。基于DRAGON框架,我们生成了大规模合成数据集——包含单跳与多跳查询——以丰富检索器的训练数据。大量实验表明,利用该数据训练的检索器在性能上取得显著提升,并展现出强大的跨领域泛化能力。此外,将我们优化的检索器集成到基础型、规划型及迭代型RAG范式中时,系统端到端的准确性均获得持续提升。