Answering complex, real-world queries often requires synthesizing facts scattered across vast document corpora. In these settings, standard retrieval-augmented generation (RAG) pipelines suffer from incomplete evidence coverage, while long-context large language models (LLMs) struggle to reason reliably over massive inputs. We introduce SPD-RAG, a hierarchical multi-agent framework for exhaustive cross-document question answering that decomposes the problem along the document axis. Each document is processed by a dedicated document-level agent operating only on its own content, enabling focused retrieval, while a coordinator dispatches tasks to relevant agents and aggregates their partial answers. Agent outputs are synthesized by merging partial answers through a token-bounded synthesis layer (which supports recursive map-reduce for massive corpora). This document-level specialization with centralized fusion improves scalability and answer quality in heterogeneous multidocument settings while yielding a modular, extensible retrieval pipeline. On the LOONG benchmark (EMNLP 2024) for long-context multi-document QA, SPD-RAG achieves an Avg Score of 58.1 (GPT-5 evaluation), outperforming Normal RAG (33.0) and Agentic RAG (32.8) while using only 38% of the API cost of a full-context baseline (68.0).
翻译:回答复杂的现实世界查询通常需要综合分散在庞大文档语料库中的事实。在这些场景下,标准的检索增强生成(RAG)流程存在证据覆盖不完整的问题,而长上下文大语言模型(LLM)难以对海量输入进行可靠的推理。我们提出了SPD-RAG,一个用于穷尽式跨文档问答的层次化多智能体框架,该框架沿文档轴对问题进行分解。每个文档由一个仅处理自身内容的专用文档级代理进行处理,从而实现聚焦式检索,同时一个协调器将任务分派给相关代理并聚合它们的部分答案。代理输出通过一个令牌受限的合成层(支持针对海量语料库的递归映射-归约)合并部分答案来综合得出。这种结合中心化融合的文档级专业化,在异构多文档环境中提高了可扩展性和答案质量,同时形成了一个模块化、可扩展的检索流程。在用于长上下文多文档问答的LOONG基准(EMNLP 2024)上,SPD-RAG取得了58.1的平均得分(GPT-5评估),优于普通RAG(33.0)和智能体RAG(32.8),同时仅使用了全上下文基线(68.0)38%的API成本。