Large language models (LLMs) achieve remarkable performance across domains but remain prone to hallucinations and inconsistencies. Retrieval-augmented generation (RAG) mitigates these issues by augmenting model inputs with relevant documents retrieved from external sources. In many real-world scenarios, relevant knowledge is fragmented across organizations or institutions, motivating the need for federated search mechanisms that can aggregate results from heterogeneous data sources without centralizing the data. We introduce RAGRoute, a lightweight routing mechanism for federated search in RAG systems that dynamically selects relevant data sources at query time using a neural classifier, avoiding indiscriminate querying. This selective routing reduces communication overhead and end-to-end latency while preserving retrieval quality, achieving up to 80.65% reductions in communication volume and 52.50% reductions in latency across three benchmarks, while matching the accuracy of querying all sources.
翻译:大语言模型(LLMs)在多个领域表现出色,但仍易产生幻觉和不一致性。检索增强生成(RAG)通过从外部来源检索相关文档并扩充模型输入来缓解这些问题。在许多现实场景中,相关知识分散于不同组织或机构,这催生了联邦搜索机制的需求,该机制能在不集中数据的情况下聚合异构数据源的检索结果。我们提出RAGRoute,一种用于RAG系统联邦搜索的轻量级路由机制,该机制利用神经分类器在查询时动态选择相关数据源,避免无差别查询。这种选择性路由在降低通信开销和端到端延迟的同时保持了检索质量,在三个基准测试中实现了高达80.65%的通信量降低和52.50%的延迟降低,同时保持了与查询所有源相当的准确率。