Long-document question answering (QA) requires large language models (LLMs) to reason over evidence scattered across lengthy documents, where answers often depend on event order, section-level context, and cross-part evidence connections. Although retrieval-augmented generation (RAG) reduces the input context by retrieving relevant evidence, existing structured RAG methods still face three limitations: costly query-agnostic knowledge organization, insufficient use of original document structure, and no reuse of historical reasoning experience. To address these limitations, we propose DocTrace, a multi-agent RAG framework for long-document QA that supports query-triggered knowledge organization, document-structure-aware and experience-guided reasoning. DocTrace preserves document hierarchy with a lightweight document structural tree index, constructs agent-shared hypergraph-structured working memory on demand during reasoning, and stores successful reasoning plans in graph-structured experience memory for future reuse, enabling adaptive exploration across related long-document questions. Experiments on four long-document QA datasets show that DocTrace achieves the best performance on three datasets, surpassing the strongest baseline, ComoRAG, by up to 8.85% in F1 and 4.40% in EM, while reducing the overall computational cost by 53.32%
翻译:长文档问答需要大语言模型跨长文本推理分散的证据,答案常取决于事件顺序、章节级上下文和跨部分证据关联。尽管检索增强生成通过检索相关证据减少了输入上下文,但现有结构化RAG方法仍存在三个局限:成本高昂的查询无关知识组织、对原始文档结构利用不足以及无法复用历史推理经验。为此,我们提出DocTrace——一个面向长文档问答的多智能体RAG框架,支持查询触发的知识组织、文档结构感知与经验引导推理。DocTrace通过轻量级文档结构树索引保留文档层次,在推理过程中按需构建智能体共享的超图结构化工作记忆,并将成功推理方案存储为图结构化经验记忆供未来复用,从而实现跨相关长文档问题的自适应探索。在四个长文档问答数据集上的实验表明,DocTrace在三个数据集上取得最优性能,在F1值和EM值上分别超过最强基线ComoRAG达8.85%和4.40%,同时将总体计算成本降低53.32%。