Retrieval-augmented generation (RAG) has emerged as a paradigm for grounding large language models in external knowledge, yet most existing RAG systems assume centralized knowledge access and ample computation. These assumptions break down in edge environments, where knowledge is fragmented across devices, raw data cannot be shared, and repeated LLM calls are prohibitively expensive. We propose FD-RAG, a federated dual-system RAG framework that decouples lightweight memory access from on-demand LLM reasoning for decentralized deployment. Specifically, FD-RAG learns semantic-aware adaptive hypergraphs over local corpora and distills them into compact QA memories. At inference time, it answers well-covered queries via direct memory matching and invokes LLM-based reasoning only when necessary, while tracing retrieved memories to hypergraph-grounded evidence. To mitigate cross-device knowledge fragmentation, FD-RAG aggregates anonymized memories across devices without exposing raw documents. Experiments on QA benchmarks show that FD-RAG improves accuracy by up to 7.8\% while reducing latency by 8.4$\times$ compared with strong local and federated baselines. We also provide theoretical analysis establishing an $\mathcal{O}(1/ε^{2})$ convergence rate for the proposed hypergraph learning, supporting its tractable deployment in edge settings.
翻译:[翻译摘要]
检索增强生成(RAG)已成为将大语言模型锚定于外部知识的范式,然而现有RAG系统大多假设集中式知识访问与充足计算资源。这些假设在边缘环境中无法成立——该环境中知识分散于各设备、原始数据不可共享、且重复调用LLM代价高昂。我们提出FD-RAG框架,这是一种联邦双系统RAG架构,通过解耦轻量级内存访问与按需LLM推理,实现去中心化部署。具体而言,FD-RAG在本地语料库上学习语义感知的自适应超图,并将其蒸馏为紧凑的问答记忆。推理时,系统通过直接内存匹配回答覆盖良好的查询,仅在必要时启动基于LLM的推理,同时将检索到的记忆追溯至超图支撑证据。为缓解跨设备知识碎片化问题,FD-RAG在不暴露原始文档的前提下聚合各设备间的匿名化记忆。在问答基准测试上的实验表明:与强基线本地及联邦方法相比,FD-RAG在准确率提升最高达7.8%的同时,延迟降低8.4倍。我们还提供理论分析,证明所提超图学习具有$\mathcal{O}(1/ε^{2})$收敛速率,支撑其在边缘场景中的可行部署。