Pruning Minimal Reasoning Graphs for Efficient Retrieval-Augmented Generation

Retrieval-augmented generation (RAG) is now standard for knowledge-intensive LLM tasks, but most systems still treat every query as fresh, repeatedly re-retrieving long passages and re-reasoning from scratch, inflating tokens, latency, and cost. We present AutoPrunedRetriever, a graph-style RAG system that persists the minimal reasoning subgraph built for earlier questions and incrementally extends it for later ones. AutoPrunedRetriever stores entities and relations in a compact, ID-indexed codebook and represents questions, facts, and answers as edge sequences, enabling retrieval and prompting over symbolic structure instead of raw text. To keep the graph compact, we apply a two-layer consolidation policy (fast ANN/KNN alias detection plus selective $k$-means once a memory threshold is reached) and prune low-value structure, while prompts retain only overlap representatives and genuinely new evidence. We instantiate two front ends: AutoPrunedRetriever-REBEL, which uses REBEL as a triplet parser, and AutoPrunedRetriever-llm, which swaps in an LLM extractor. On GraphRAG-Benchmark (Medical and Novel), both variants achieve state-of-the-art complex reasoning accuracy, improving over HippoRAG2 by roughly 9--11 points, and remain competitive on contextual summarize and generation. On our harder STEM and TV benchmarks, AutoPrunedRetriever again ranks first, while using up to two orders of magnitude fewer tokens than graph-heavy baselines, making it a practical substrate for long-running sessions, evolving corpora, and multi-agent pipelines.

翻译：检索增强生成（RAG）现已成为知识密集型大语言模型任务的标准范式，但大多数系统仍将每个查询视为全新任务，反复重新检索长段落并从头开始重新推理，导致令牌消耗、延迟和成本激增。本文提出AutoPrunedRetriever，一种图式RAG系统，该系统持久化存储为先前问题构建的最小推理子图，并针对后续问题进行增量扩展。AutoPrunedRetriever将实体和关系存储在紧凑的ID索引码本中，并将问题、事实和答案表示为边序列，从而实现在符号结构而非原始文本上进行检索和提示。为保持图结构紧凑，我们采用双层整合策略（快速ANN/KNN别名检测配合选择性$k$-均值聚类，在达到内存阈值时触发）并剪枝低价值结构，同时提示中仅保留重叠代表性和真正的新证据。我们实现了两个前端系统：使用REBEL作为三元组解析器的AutoPrunedRetriever-REBEL，以及改用LLM提取器的AutoPrunedRetriever-llm。在GraphRAG-Benchmark（医学与小说数据集）上，两种变体均实现了最先进的复杂推理准确率，较HippoRAG2提升约9-11个百分点，并在上下文摘要和生成任务中保持竞争力。在我们构建的更复杂的STEM和电视领域基准测试中，AutoPrunedRetriever再次位列第一，同时使用的令牌数量比图密集型基线方法少两个数量级，这使其成为长期运行会话、动态语料库和多智能体流水线的实用基础架构。