Information-seeking (IS) agents have achieved strong performance across a range of wide and deep search tasks, yet their tool use remains largely restricted to API-level snippet retrieval and URL-based page fetching, limiting access to the richer information available through real browsing. While full browser interaction could unlock deeper capabilities, its fine-grained control and verbose page content returns introduce substantial complexity for ReAct-style function-calling agents. To bridge this gap, we propose Nested Browser-Use Learning (NestBrowse), which introduces a minimal and complete browser-action framework that decouples interaction control from page exploration through a nested structure. This design simplifies agentic reasoning while enabling effective deep-web information acquisition. Empirical results on challenging deep IS benchmarks demonstrate that NestBrowse offers clear benefits in practice. Further in-depth analyses underscore its efficiency and flexibility.
翻译:信息检索智能体已在广泛而深入的搜索任务中展现出优异性能,但其工具使用仍主要局限于API级别的片段检索和基于URL的页面获取,限制了通过真实浏览获取更丰富信息的能力。虽然完整的浏览器交互可解锁更深层能力,但其细粒度控制和冗长的页面内容返回为ReAct式函数调用智能体带来了显著复杂性。为弥合这一差距,我们提出嵌套式浏览器使用学习,通过引入一个最小化且完整的浏览器操作框架,采用嵌套结构将交互控制与页面探索解耦。该设计在简化智能体推理的同时,实现了有效的深层网络信息获取。在具有挑战性的深度信息检索基准测试中的实证结果表明,嵌套式浏览器使用学习在实践中具有明显优势。进一步的深入分析印证了其高效性与灵活性。