Despite the strong performance achieved by reinforcement learning-trained information-seeking agents, learning in open-ended web environments remains severely constrained by low signal-to-noise feedback. Text-based parsers often discard layout semantics and introduce unstructured noise, while long-horizon training typically relies on sparse outcome rewards that obscure which retrieval actions actually matter. We propose a visual-native search framework that represents webpages as visual snapshots, allowing agents to leverage layout cues to quickly localize salient evidence and suppress distractors. To learn effectively from these high-dimensional observations, we introduce Information-Aware Credit Assignment (ICA), a post-hoc method that estimates each retrieved snapshot's contribution to the final outcome via posterior analysis and propagates dense learning signals back to key search turns. Integrated with a GRPO-based training pipeline, our approach consistently outperforms text-based baselines on diverse information-seeking benchmarks, providing evidence that visual snapshot grounding with information-level credit assignment alleviates the credit-assignment bottleneck in open-ended web environments. The code and datasets will be released in https://github.com/pc-inno/ICA_MM_deepsearch.git.
翻译:尽管强化学习训练的信息搜索智能体取得了强劲性能,但在开放式网络环境中的学习仍严重受限于低信噪比反馈。基于文本的解析器通常会丢弃布局语义并引入非结构化噪声,而长视野训练通常依赖于稀疏的结果奖励,这模糊了哪些检索动作真正重要。我们提出了一种视觉原生搜索框架,将网页表示为视觉快照,使智能体能够利用布局线索快速定位关键证据并抑制干扰信息。为了从这些高维观测中有效学习,我们引入了信息感知信用分配(ICA),这是一种通过后验分析估计每个检索快照对最终结果的贡献,并将密集学习信号传播回关键搜索轮次的事后方法。结合基于GRPO的训练流程,我们的方法在多样化信息搜索基准测试中持续优于基于文本的基线,证明了视觉快照基础化与信息级信用分配能够缓解开放式网络环境中的信用分配瓶颈。代码与数据集将在 https://github.com/pc-inno/ICA_MM_deepsearch.git 发布。