Retrieval-Augmented Generation (RAG) improves factuality by grounding LLMs in external knowledge, yet conventional centralized RAG requires aggregating distributed data, raising privacy risks and incurring high retrieval latency and cost. We present DGRAG, a distributed graph-driven RAG framework for edge-cloud collaborative systems. Each edge device organizes local documents into a knowledge graph and periodically uploads subgraph-level summaries to the cloud for lightweight global indexing without exposing raw data. At inference time, queries are first answered on the edge; a gate mechanism assesses the confidence and consistency of multiple local generations to decide whether to return a local answer or escalate the query. For escalated queries, the cloud performs summary-based matching to identify relevant edges, retrieves supporting evidence from them, and generates the final response with a cloud LLM. Experiments on distributed question answering show that DGRAG consistently outperforms decentralized baselines while substantially reducing cloud overhead.
翻译:检索增强生成(RAG)通过将大语言模型(LLM)建立在外部知识基础上来提升事实准确性,然而传统的集中式RAG需要聚合分布式数据,这带来了隐私风险,并导致较高的检索延迟与成本。本文提出DGRAG,一种用于边缘-云协同系统的分布式图驱动RAG框架。每个边缘设备将本地文档组织成知识图谱,并定期向云端上传子图级别的摘要,以进行轻量级的全局索引,而无需暴露原始数据。在推理阶段,查询首先在边缘端进行应答;一个门控机制会评估多个本地生成结果的置信度与一致性,以决定是返回本地答案还是将查询升级处理。对于升级的查询,云端执行基于摘要的匹配以识别相关边缘节点,从中检索支持性证据,并使用云端LLM生成最终响应。在分布式问答任务上的实验表明,DGRAG在显著降低云端开销的同时,始终优于去中心化的基线方法。