Deep search has become a crucial capability for frontier multimodal agents, enabling models to solve complex questions through active search, evidence verification, and multi-step reasoning. Despite rapid progress, top-tier multimodal search agents remain difficult to reproduce, largely due to the absence of open high-quality training data, transparent trajectory synthesis pipelines, or detailed training recipes. To this end, we introduce OpenSearch-VL, a fully open-source recipe for training frontier multimodal deep search agents with agentic reinforcement learning. First, we curated a dedicated pipeline to construct high-quality training data through Wikipedia path sampling, fuzzy entity rewriting, and source-anchor visual grounding, which jointly reduce shortcuts and one-step retrieval collapse. Based on this pipeline, we curate two training datasets, SearchVL-SFT-36k for SFT and SearchVL-RL-8k for RL. Besides, we design a diverse tool environment that unifies text search, image search, OCR, cropping, sharpening, super-resolution, and perspective correction, enabling agents to combine active perception with external knowledge acquisition. Finally, we propose a multi-turn fatal-aware GRPO training algorithm that handles cascading tool failures by masking post-failure tokens while preserving useful pre-failure reasoning through one-sided advantage clamping. Built on this recipe, OpenSearch-VL delivers substantial performance gains, with over 10-point average improvements across seven benchmarks, and achieves results comparable to proprietary commercial models on several tasks. We will release all data, code, and models to support open research on multimodal deep search agents.
翻译:深度搜索已成为前沿多模态智能体的关键能力,使其能够通过主动搜索、证据验证与多步推理解决复杂问题。尽管进展迅速,顶尖多模态搜索智能体仍难以复现,这主要源于开源高质量训练数据、透明轨迹合成流程及详细训练配方的缺失。为此,我们提出OpenSearch-VL——一个完全开源的、基于智能体强化学习训练前沿多模态深度搜索智能体的配方。首先,我们构建了一条专用数据合成流程,通过维基百科路径采样、模糊实体重写与源锚视觉定位生成高质量训练数据,共同减少捷径学习与一步检索崩溃。基于该流程,我们整理了两个训练数据集:用于监督微调的SearchVL-SFT-36k与用于强化学习的SearchVL-RL-8k。此外,我们设计了一个多样化的工具环境,统一文本搜索、图像搜索、光学字符识别、裁剪、锐化、超分辨率与透视校正功能,使智能体能够将主动感知与外部知识获取相结合。最后,我们提出一种多轮致命感知的GRPO训练算法,通过屏蔽故障后令牌并利用单边优势裁剪保留故障前有用推理,从而处理级联工具故障。基于该配方,OpenSearch-VL在七个基准测试中平均提升超过10个百分点,并在多项任务上取得与专有商业模型相当的结果。我们将开源所有数据、代码与模型,以支持多模态深度搜索智能体的开放研究。