Agentic Retrieval-Augmented Generation (Agentic RAG) has become a widely adopted paradigm for multi-hop question answering and complex knowledge reasoning, where retrieval and reasoning are interleaved at inference time. As reasoning trajectories grow longer, failures become increasingly common. Existing approaches typically address such failures by either stopping at diagnostic analysis or rerunning the entire retrieval-reasoning pipeline, which leads to substantial computational overhead and redundant reasoning. In this paper, we propose Doctor-RAG (DR-RAG), a unified diagnose-and-repair framework that corrects failures in Agentic RAG through explicit error localization and prefix reuse, enabling minimal-cost intervention. DR-RAG decomposes failure handling into two consecutive stages: (i) trajectory-level failure diagnosis and localization, which attributes errors to a coverage-gated taxonomy and identifies the earliest failure point in the reasoning trajectory; and (ii) tool-conditioned local repair, which intervenes only at the diagnosed failure point while maximally reusing validated reasoning prefixes and retrieved evidence. By explicitly separating error attribution from correction, DR-RAG enables precise error localization, thereby avoiding expensive full-pipeline reruns and enabling targeted, efficient repair. We evaluate DR-RAG across three multi-hop question answering benchmarks, multiple agentic RAG baselines, and different backbone models. Experimental results demonstrate that DR-RAG substantially improves answer accuracy while significantly reducing reasoning token consumption compared to rerun-based repair strategies.
翻译:智能体检索增强生成(Agentic RAG)已成为多跳问答与复杂知识推理的广泛采用范式,其中检索与推理在推理时交错进行。随着推理轨迹增长,故障日益频发。现有方法通常通过停止诊断分析或重新运行整个检索-推理流水线来处理此类故障,导致大量计算开销与冗余推理。本文提出Doctor-RAG(DR-RAG),一种统一的诊断与修复框架,通过显式错误定位与前缀复用纠正Agentic RAG中的故障,实现最小代价干预。DR-RAG将故障处理分解为两个连续阶段:(i)轨迹级故障诊断与定位,将错误归因于覆盖门控分类体系,并识别推理轨迹中的最早故障点;(ii)工具条件化局部修复,仅在诊断出的故障点进行干预,同时最大程度复用已验证的推理前缀与检索证据。通过显式分离错误归因与修正,DR-RAG实现了精确错误定位,从而避免昂贵的全流水线重运行,实现目标导向的高效修复。我们在三个多跳问答基准、多种Agentic RAG基线及不同骨干模型上评估DR-RAG。实验结果表明,与基于重运行的修复策略相比,DR-RAG在显著降低推理令牌消耗的同时大幅提升了答案准确率。