Subgraph Reconstruction Attacks on Graph RAG Deployments with Practical Defenses

Graph-based retrieval-augmented generation (Graph RAG) is increasingly deployed to support LLM applications by augmenting user queries with structured knowledge retrieved from a knowledge graph. While Graph RAG improves relational reasoning, it introduces a largely understudied threat: adversaries can reconstruct subgraphs from a target RAG system's knowledge graph, enabling privacy inference and replication of curated knowledge assets. We show that existing attacks are largely ineffective against Graph RAG even with simple prompt-based safeguards, because these attacks expose explicit exfiltration intent and are therefore easily suppressed by lightweight safe prompts. We identify three technical challenges for practical Graph RAG extraction under realistic safeguards and introduce GRASP, a closed-box, multi-turn subgraph reconstruction attack. GRASP (i) reframes extraction as a context-processing task, (ii) enforces format-compliant, instance-grounded outputs via per-record identifiers to reduce hallucinations and preserve relational details, and (iii) diversifies goal-driven attack queries using a momentum-aware scheduler to operate within strict query budgets. Across two real-world knowledge graphs, four safety-aligned LLMs, and multiple Graph RAG frameworks, GRASP attains the strongest type-faithful reconstruction where prior methods fail, reaching up to 82.9 F1. We further evaluate defenses and propose two lightweight mitigations that substantially reduce reconstruction fidelity without utility loss.

翻译：基于图结构的检索增强生成（Graph RAG）正日益广泛地部署于支持大语言模型应用，通过从知识图谱中检索结构化知识来增强用户查询。尽管Graph RAG提升了关系推理能力，但其引入了一种尚未被充分研究的威胁：攻击者能够从目标RAG系统的知识图谱中重构子图，从而实现隐私推断和定制化知识资产的复现。我们发现，即使采用简单的基于提示的安全措施，现有攻击方法对Graph RAG也基本无效，因为这些攻击暴露了显式的信息窃取意图，因而极易被轻量级安全提示所抑制。我们指出了在现实防护措施下实现实用化Graph RAG提取所面临的三个技术挑战，并提出了GRASP——一种黑盒、多轮次的子图重构攻击方法。GRASP（i）将提取任务重新定义为上下文处理任务，（ii）通过每条记录的唯一标识符强制生成格式合规、实例锚定的输出，以减少幻觉并保持关系细节，（iii）采用动量感知调度器对目标驱动的攻击查询进行多样化处理，以在严格的查询预算内实施操作。在两个真实世界知识图谱、四个安全对齐的大语言模型以及多种Graph RAG框架上的实验表明，GRASP在现有方法失效的场景下实现了最强的类型一致性重构效果，F1分数最高达到82.9。我们进一步评估了防御机制，并提出两种轻量级缓解方案，这些方案能在不损失系统效用的前提下显著降低重构的保真度。