DeepRead: Document Structure-Aware Reasoning to Enhance Agentic Search

With the rapid progress of tool-using and agentic large language models (LLMs), Retrieval-Augmented Generation (RAG) is evolving from one-shot, passive retrieval into multi-turn, decision-driven evidence acquisition. Despite strong results in open-domain settings, existing agentic search frameworks commonly treat long documents as flat collections of chunks, underutilizing document-native priors such as hierarchical organization and sequential discourse structure. We introduce DeepRead, a structure-aware, multi-turn document reasoning agent that explicitly operationalizes these priors for long-document question answering. DeepRead leverages LLM-based OCR model to convert PDFs into structured Markdown that preserves headings and paragraph boundaries. It then indexes documents at the paragraph level and assigns each paragraph a coordinate-style metadata key encoding its section identity and in-section order. Building on this representation, DeepRead equips the LLM with two complementary tools: a Retrieve tool that localizes relevant paragraphs while exposing their structural coordinates (with lightweight scanning context), and a ReadSection tool that enables contiguous, order-preserving reading within a specified section and paragraph range. Our experiments demonstrate that DeepRead achieves significant improvements over Search-o1-style agentic search in document question answering. The synergistic effect between retrieval and reading tools is also validated. Our fine-grained behavioral analysis reveals a reading and reasoning paradigm resembling human-like ``locate then read'' behavior.

翻译：随着工具调用与智能体化大语言模型（LLM）的快速发展，检索增强生成（RAG）正从单次、被动的检索演变为多轮、决策驱动的证据获取。尽管在开放域场景中取得了显著成果，现有的智能搜索框架通常将长文档视为扁平的文本块集合，未能充分利用文档固有的先验信息，如层次化组织与序列化篇章结构。本文提出DeepRead，一种结构感知的多轮文档推理智能体，它显式地将这些先验信息应用于长文档问答任务。DeepRead利用基于LLM的OCR模型将PDF转换为保留标题与段落边界的结构化Markdown格式。随后，它在段落级别对文档建立索引，并为每个段落分配一个坐标式元数据键，该键编码了其所属章节的身份及在章节内的顺序。基于此表示，DeepRead为LLM配备了两种互补的工具：一个检索工具，用于定位相关段落并同时暴露其结构坐标（附带轻量级的扫描上下文）；以及一个阅读章节工具，支持在指定章节和段落范围内进行连续、保序的阅读。实验表明，DeepRead在文档问答任务上相比Search-o1风格的智能体搜索取得了显著提升。检索与阅读工具之间的协同效应也得到了验证。我们细粒度的行为分析揭示了一种类似于人类“定位后阅读”行为的阅读与推理范式。