嵌套式浏览器使用学习用于智能信息检索 (Nested Browser-Use Learning for Agentic Information Seeking) - 专知论文

会员服务 ·

0

信息检索 · 交互 · 智能体 · 工具 · API ·

2025 年 12 月 29 日

Nested Browser-Use Learning for Agentic Information Seeking

翻译：嵌套式浏览器使用学习用于智能信息检索

Baixuan Li,Jialong Wu,Wenbiao Yin,Kuan Li,Zhongwang Zhang,Huifeng Yin,Zhengwei Tao,Liwen Zhang,Pengjun Xie,Jingren Zhou,Yong Jiang

Information-seeking (IS) agents have achieved strong performance across a range of wide and deep search tasks, yet their tool use remains largely restricted to API-level snippet retrieval and URL-based page fetching, limiting access to the richer information available through real browsing. While full browser interaction could unlock deeper capabilities, its fine-grained control and verbose page content returns introduce substantial complexity for ReAct-style function-calling agents. To bridge this gap, we propose Nested Browser-Use Learning (NestBrowse), which introduces a minimal and complete browser-action framework that decouples interaction control from page exploration through a nested structure. This design simplifies agentic reasoning while enabling effective deep-web information acquisition. Empirical results on challenging deep IS benchmarks demonstrate that NestBrowse offers clear benefits in practice. Further in-depth analyses underscore its efficiency and flexibility.

翻译：信息检索智能体已在广泛而深入的搜索任务中展现出优异性能，但其工具使用仍主要局限于API级别的片段检索和基于URL的页面获取，限制了通过真实浏览获取更丰富信息的能力。虽然完整的浏览器交互可解锁更深层能力，但其细粒度控制和冗长的页面内容返回为ReAct式函数调用智能体带来了显著复杂性。为弥合这一差距，我们提出嵌套式浏览器使用学习，通过引入一个最小化且完整的浏览器操作框架，采用嵌套结构将交互控制与页面探索解耦。该设计在简化智能体推理的同时，实现了有效的深层网络信息获取。在具有挑战性的深度信息检索基准测试中的实证结果表明，嵌套式浏览器使用学习在实践中具有明显优势。进一步的深入分析印证了其高效性与灵活性。

0

相关内容

信息检索

信息检索( Information Retrieval )指信息按一定的方式组织起来，并根据信息用户的需要找出有关的信息的过程和技术。信息检索的目标：准确、及时、全面的获取所需信息。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

【CVPR 2022】基于Transformer的图象风格化，StyTr2: Image Style Transfer with Transformers

【CVPR 2022】基于Transformer的图象风格化，StyTr2: Image Style Transfer with Transformers

专知会员服务

11+阅读 · 2022年3月19日

【CVPR 2022】MixFormer：跨窗口与维度的特征融合，MixFormer: Mixing Features across Windows and Dimensions

【CVPR 2022】MixFormer：跨窗口与维度的特征融合，MixFormer: Mixing Features across Windows and Dimensions

专知会员服务

15+阅读 · 2022年3月19日

【ICML2021】图对比学习自动化

专知会员服务

41+阅读 · 2021年6月19日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知会员服务

78+阅读 · 2020年7月23日

Kaggle知识点：伪标签Pseudo Label

Kaggle知识点：伪标签Pseudo Label

AINLP

40+阅读 · 2020年8月9日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

专知

11+阅读 · 2020年3月17日

如何用机器学习精准辨别“背景”和“目标”

如何用机器学习精准辨别“背景”和“目标”

论智

10+阅读 · 2018年10月22日

论文报告 | Graph-based Neural Multi-Document Summarization

论文报告 | Graph-based Neural Multi-Document Summarization

科技创新与创业

15+阅读 · 2017年12月15日

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

基于DASH的交互式三维视频系统建模

国家自然科学基金

1+阅读 · 2015年12月31日

不确定知识图谱中面向结构查询的众包清洗研究

国家自然科学基金

4+阅读 · 2015年12月31日

自由视点三维视频中纹理-深度图像联合建模及应用

国家自然科学基金

0+阅读 · 2015年12月31日

语义关联的地理视频数据自适应组织方法

国家自然科学基金

1+阅读 · 2014年12月31日

Go with Your Gut: Scaling Confidence for Autoregressive Image Generation

Arxiv

0+阅读 · 1月6日

CREAM: Continual Retrieval on Dynamic Streaming Corpora with Adaptive Soft Memory

Arxiv

0+阅读 · 1月6日

Crafting Adversarial Inputs for Large Vision-Language Models Using Black-Box Optimization

Arxiv

0+阅读 · 1月5日

CEE: An Inference-Time Jailbreak Defense for Embodied Intelligence via Subspace Concept Rotation

Arxiv

0+阅读 · 1月5日

A Multi-Task Embedder For Retrieval Augmented LLMs

Arxiv

0+阅读 · 1月3日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

【CVPR 2022】基于Transformer的图象风格化，StyTr2: Image Style Transfer with Transformers

【CVPR 2022】基于Transformer的图象风格化，StyTr2: Image Style Transfer with Transformers

专知会员服务

11+阅读 · 2022年3月19日

【CVPR 2022】MixFormer：跨窗口与维度的特征融合，MixFormer: Mixing Features across Windows and Dimensions

【CVPR 2022】MixFormer：跨窗口与维度的特征融合，MixFormer: Mixing Features across Windows and Dimensions

专知会员服务

15+阅读 · 2022年3月19日

【ICML2021】图对比学习自动化

专知会员服务

41+阅读 · 2021年6月19日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知会员服务

78+阅读 · 2020年7月23日

热门VIP内容

开通专知VIP会员享更多权益服务

智能体评判者（Agent-as-a-Judge）研究综述

《空战中心自动化持续训练》报告

区块链自主智能体：标准规范、执行模型与信任边界研究

面向无人机战场调整作战训练中心

相关资讯

Kaggle知识点：伪标签Pseudo Label

Kaggle知识点：伪标签Pseudo Label

AINLP

40+阅读 · 2020年8月9日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

专知

11+阅读 · 2020年3月17日

如何用机器学习精准辨别“背景”和“目标”

如何用机器学习精准辨别“背景”和“目标”

论智

10+阅读 · 2018年10月22日

论文报告 | Graph-based Neural Multi-Document Summarization

论文报告 | Graph-based Neural Multi-Document Summarization

科技创新与创业

15+阅读 · 2017年12月15日

相关论文

Go with Your Gut: Scaling Confidence for Autoregressive Image Generation

Arxiv

0+阅读 · 1月6日

CREAM: Continual Retrieval on Dynamic Streaming Corpora with Adaptive Soft Memory

Arxiv

0+阅读 · 1月6日

Crafting Adversarial Inputs for Large Vision-Language Models Using Black-Box Optimization

Arxiv

0+阅读 · 1月5日

CEE: An Inference-Time Jailbreak Defense for Embodied Intelligence via Subspace Concept Rotation

Arxiv

0+阅读 · 1月5日

A Multi-Task Embedder For Retrieval Augmented LLMs

Arxiv

0+阅读 · 1月3日

相关基金

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

基于DASH的交互式三维视频系统建模

国家自然科学基金

1+阅读 · 2015年12月31日

不确定知识图谱中面向结构查询的众包清洗研究

国家自然科学基金

4+阅读 · 2015年12月31日

自由视点三维视频中纹理-深度图像联合建模及应用

国家自然科学基金

0+阅读 · 2015年12月31日

语义关联的地理视频数据自适应组织方法

国家自然科学基金

1+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员