Retrieval-augmented generation (RAG) has become a powerful framework for enhancing large language models in knowledge-intensive and reasoning tasks. However, as reasoning chains deepen or search trees expand, RAG systems often face two persistent failures: evidence forgetting, where retrieved knowledge is not effectively used, and inefficiency, caused by uncontrolled query expansions and redundant retrieval. These issues reveal a critical gap between retrieval and evidence utilization in current RAG architectures. We propose PruneRAG, a confidence-guided query decomposition framework that builds a structured query decomposition tree to perform stable and efficient reasoning. PruneRAG introduces three key mechanisms: adaptive node expansion that regulates tree width and depth, confidence-guided decisions that accept reliable answers and prune uncertain branches, and fine-grained retrieval that extracts entity-level anchors to improve retrieval precision. Together, these components preserve salient evidence throughout multi-hop reasoning while significantly reducing retrieval overhead. To better analyze evidence misuse, we define the Evidence Forgetting Rate as a metric to quantify cases where golden evidence is retrieved but not correctly used. Extensive experiments across various multi-hop QA benchmarks show that PruneRAG achieves superior accuracy and efficiency over state-of-the-art baselines.
翻译:检索增强生成已成为增强大语言模型在知识密集型和推理任务中性能的强大框架。然而,随着推理链加深或搜索树扩展,RAG系统常面临两个持续性问题:证据遗忘(即检索到的知识未被有效利用)和效率低下(由不受控的查询扩展和冗余检索导致)。这些问题揭示了当前RAG架构中检索与证据利用之间的关键差距。我们提出PruneRAG,一种基于置信度引导的查询分解框架,通过构建结构化查询分解树来实现稳定高效的推理。PruneRAG引入三个关键机制:调节树宽度与深度的自适应节点扩展、接受可靠答案并剪枝不确定分支的置信度引导决策,以及提取实体级锚点以提升检索精度的细粒度检索。这些组件共同作用,在多跳推理过程中保留关键证据,同时显著降低检索开销。为更好地分析证据误用,我们定义了证据遗忘率作为量化指标,用于衡量黄金证据被检索到但未被正确使用的情况。在多种多跳问答基准上的大量实验表明,PruneRAG在准确性和效率上均优于当前最先进的基线方法。