Retrieval-Augmented Generation (RAG) systems typically rely on a single fixed retriever, despite growing evidence that no single retriever performs optimally across all query types. In this paper, we explore a query routing approach that dynamically selects from a pool of retrievers based on the query, using both train-free heuristics and learned routing models. We frame routing as a learning-to-rank problem and introduce LTRR, a framework that Learns To Rank Retrievers according to their expected contribution to downstream RAG performance. Through experiments on diverse question-answering benchmarks with controlled variations in query types, we demonstrate that routing-based RAG consistently surpasses the strongest single-retriever baselines. The gains are particularly substantial when training with the Answer Correctness (AC) objective and when using pairwise ranking methods, with XGBoost yielding the best results. Additionally, our approach exhibits stronger generalization to out-of-distribution queries. Overall, our results underscore the critical role of both training strategy and optimization metric choice in effective query routing for RAG systems.
翻译:检索增强生成(RAG)系统通常依赖单一固定检索器,尽管已有研究表明,没有任何检索器能在所有查询类型上表现最优。本文探索了一种基于查询动态选择检索器池中检索器的查询路由方法,该方法融合了无训练启发式策略与学习型路由模型。我们将路由问题构建为排序学习任务,并提出LTRR框架——该框架能根据各检索器对下游RAG性能的预期贡献进行排序学习。通过在具有可控查询类型变化的多样化问答基准上开展实验,我们证明基于路由的RAG系统始终优于最强单检索器基线。当采用答案正确性(AC)优化目标训练,并使用成对排序方法时(尤其以XGBoost方法效果最佳),性能提升尤为显著。此外,我们的方法在分布外查询上展现出更强的泛化能力。总体而言,研究结果强调了训练策略选择与优化指标选择对RAG系统高效查询路由的关键作用。