GrepSeek: Training Search Agents for Direct Corpus Interaction

Large Language Model (LLM) search agents have shown strong promise for knowledge-intensive language tasks through multiple rounds of reasoning and information retrieval. Most existing systems access information using a retriever that takes a keyword or natural language query and returns a ranked list of documents using an index of pre-computed document representations. In this work, we explore a complementary perspective in which the search agent treats the corpus itself as the search environment and finds evidence by issuing executable shell commands. We introduce GrepSeek, an optimized direct corpus interaction (DCI) search agent that trains a compact search agent to find, filter, and compose evidence from large text corpora. To address the instability of learning behavior directly with reinforcement learning on large corpora, we propose a two-stage training pipeline. First, we construct a cold-start dataset using an answer-aware Tutor and answer-blind Planner to generate verified, causally grounded search trajectories. Second, we refine the initialized policy with Group Relative Policy Optimization (GRPO), allowing the agent to improve its task-oriented search behavior through direct interaction with the corpus. To make DCI practical at scale, we further use a semantics-preserving sharded-parallel execution engine that accelerates shell-based retrieval by up to $7.6\times$ while preserving byte-exact equivalence with sequential execution of the shell command. Experiments across seven open-domain question answering benchmarks show that GrepSeek achieves the strongest overall token-level $F_1$ and Exact Match. Our analysis also highlights the limitations of purely lexical interaction on queries with substantial surface-form variation, suggesting DCI as a practical and competitive method for search agents that can complement existing retrieval paradigms in the real world.

翻译：大型语言模型（LLM）搜索智能体通过多轮推理与信息检索，在知识密集型语言任务中展现出强大潜力。现有系统大多依赖检索器获取信息，该检索器接收关键词或自然语言查询后，利用预计算文档表征的索引返回排序后的文档列表。本研究探索了一种互补视角：搜索智能体将语料库本身视为搜索环境，通过执行可执行的shell命令来寻找证据。我们提出GrepSeek（一种优化的直接语料库交互搜索智能体），训练紧凑型搜索智能体从大规模文本语料库中查找、筛选并整合证据。为解决直接在大规模语料库上使用强化学习导致的学习行为不稳定性，我们设计了两阶段训练流程：首先，利用答案感知的Tutor与答案盲的Planner构建冷启动数据集，生成经验证的因果关联搜索轨迹；其次，采用群组相对策略优化（GRPO）对初始化策略进行精炼，使智能体通过与语料库的直接交互改进任务导向的搜索行为。为使直接语料库交互具备规模化实用性，我们进一步提出保持语义一致的分片并行执行引擎，该引擎可将基于shell的检索速度提升至原来的7.6倍，同时保证与shell命令顺序执行的字节级精确等价性。在七个开放域问答基准上的实验表明，GrepSeek在整体词元级$F_1$和精确匹配指标上取得最优结果。我们的分析还揭示了纯词汇交互在查询存在显著表层形式变异时的局限性，这表明直接语料库交互可作为搜索智能体的一种实用且具有竞争力的方法，能在现实世界中补充现有检索范式。