Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by grounding generation in external knowledge to improve factuality and reduce hallucinations. Yet most deployments assume a centralized corpus, which is infeasible in privacy aware domains where knowledge remains siloed. This motivates federated RAG (FedRAG), where a central LLM server collaborates with distributed silos without sharing raw documents. In context RAG violates this requirement by transmitting verbatim documents, whereas parametric RAG encodes documents into lightweight adapters that merge with a frozen LLM at inference, avoiding raw-text exchange. We adopt the parametric approach but face two unique challenges induced by FedRAG: high storage and communication from per-document adapters, and destructive aggregation caused by indiscriminately merging multiple adapters. We present FedMosaic, the first federated RAG framework built on parametric adapters. FedMosaic clusters semantically related documents into multi-document adapters with document-specific masks to reduce overhead while preserving specificity, and performs selective adapter aggregation to combine only relevance-aligned, nonconflicting adapters. Experiments show that FedMosaic achieves an average 10.9% higher accuracy than state-of-the-art methods in four categories, while lowering storage costs by 78.8% to 86.3% and communication costs by 91.4%, and never sharing raw documents.
翻译:检索增强生成(RAG)通过将生成过程锚定于外部知识,增强大型语言模型(LLM)的事实性并减少幻觉。然而,现有部署大多假设存在集中式语料库,这在知识分散存储的隐私敏感领域并不可行。这催生了联邦RAG(FedRAG),即中央LLM服务器与分布式数据孤岛协作,且无需共享原始文档。上下文RAG因传输原始文档而违反此要求,而参数化RAG将文档编码为轻量级适配器,在推理时与冻结的LLM融合,避免了原始文本交换。我们采用参数化方法,但面临FedRAG带来的两个独特挑战:逐文档适配器导致的高存储与通信开销,以及无差别合并多个适配器引发的破坏性聚合。本文提出FedMosaic,首个基于参数化适配器的联邦RAG框架。FedMosaic将语义相关文档聚类至多文档适配器中,并通过文档特定掩码在保留特异性的同时降低开销;同时执行选择性适配器聚合,仅合并相关性对齐且无冲突的适配器。实验表明,在四类任务中FedMosaic平均准确率较前沿方法提升10.9%,存储成本降低78.8%至86.3%,通信成本降低91.4%,且从未共享原始文档。