Graph-based retrieval-augmented generation (GraphRAG) systems construct knowledge graphs over document collections to support multi-hop reasoning. While prior work shows that GraphRAG responses may leak retrieved subgraphs, the feasibility of query-efficient reconstruction of the hidden graph structure remains unexplored under realistic query budgets. We study a budget-constrained black-box setting where an adversary adaptively queries the system to steal its latent entity-relation graph. We propose AGEA (Agentic Graph Extraction Attack), a framework that leverages a novelty-guided exploration-exploitation strategy, external graph memory modules, and a two-stage graph extraction pipeline combining lightweight discovery with LLM-based filtering. We evaluate AGEA on medical, agriculture, and literary datasets across Microsoft-GraphRAG and LightRAG systems. Under identical query budgets, AGEA significantly outperforms prior attack baselines, recovering up to 90% of entities and relationships while maintaining high precision. These results demonstrate that modern GraphRAG systems are highly vulnerable to structured, agentic extraction attacks, even under strict query limits.
翻译:基于图的检索增强生成(GraphRAG)系统通过在文档集合上构建知识图谱来支持多跳推理。虽然已有研究表明GraphRAG的响应可能泄露检索到的子图,但在实际查询预算下,高效查询以重建隐藏图结构的可行性仍未得到充分探索。本研究探讨一种预算受限的黑盒场景:攻击者通过自适应查询系统以窃取其潜在的实体-关系图。我们提出AGEA(智能图结构提取攻击)框架,该框架利用一种新颖性引导的探索-利用策略、外部图记忆模块,以及一个结合轻量级发现与基于大语言模型过滤的两阶段图提取流程。我们在医疗、农业和文学数据集上,针对Microsoft-GraphRAG和LightRAG系统评估了AGEA。在相同查询预算下,AGEA显著优于现有攻击基线,能够恢复高达90%的实体与关系,同时保持高精度。这些结果表明,即使在严格的查询限制下,现代GraphRAG系统仍高度易受结构化、智能化的提取攻击。