With the rapid advancement of tool-use capabilities in Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) is shifting from static, one-shot retrieval toward autonomous, multi-turn evidence acquisition. However, existing agentic search frameworks typically treat long documents as flat collections of unstructured chunks, disregarding the native hierarchical organization and sequential logic essential for human comprehension. To bridge this gap, we introduce \textbf{DeepRead}, a structure-aware document reasoning agent designed to operationalize document-native structural priors into actionable reasoning capabilities. Leveraging the structural fidelity of modern OCR, DeepRead constructs a paragraph-level, coordinate-based navigation system and equips the LLM with two synergistic tools: \textsf{Retrieve} for scanning-aware localization, and \textsf{ReadSection} for contiguous, order-preserving reading within specific hierarchical scopes. This design elicits a human-like ``locate-then-read'' reasoning paradigm, effectively mitigating the context fragmentation inherent in traditional retrieval methods. Extensive evaluations across four benchmarks spanning diverse document types demonstrate that DeepRead outperforms Search-o1-style agentic search baselines by an average of 10.3\%. Fine-grained behavioral analysis further confirms that DeepRead autonomously adopts human-aligned reading strategies, validating the critical role of structural awareness in achieving precise document reasoning. Our code is available at https://github.com/Zhanli-Li/DeepRead.
翻译:随着大型语言模型(LLM)工具使用能力的快速发展,检索增强生成(RAG)正从静态的单次检索转向自主的多轮证据获取。然而,现有的智能体搜索框架通常将长文档视为非结构化文本块的扁平集合,忽略了对于人类理解至关重要的原生层次化组织与顺序逻辑。为弥合这一差距,我们提出了 **DeepRead**,一种结构感知的文档推理智能体,旨在将文档原生结构先验转化为可操作的推理能力。借助现代OCR技术的结构保真度,DeepRead构建了一个基于坐标的段落级导航系统,并为LLM配备了两种协同工具:用于扫描感知定位的 \textsf{Retrieve},以及用于在特定层次化范围内进行连续、保序阅读的 \textsf{ReadSection}。该设计引发出一种类人的“先定位后阅读”推理范式,有效缓解了传统检索方法固有的上下文碎片化问题。在涵盖多种文档类型的四个基准测试上进行广泛评估,结果表明DeepRead平均优于Search-o1风格的智能体搜索基线10.3\%。细粒度的行为分析进一步证实,DeepRead自主采用了与人类对齐的阅读策略,验证了结构感知在实现精确文档推理中的关键作用。我们的代码可在 https://github.com/Zhanli-Li/DeepRead 获取。