Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

Language agents based on large language models (LLMs) have demonstrated great promise in automating web-based tasks. Recent work has shown that incorporating advanced planning algorithms, e.g., tree search, is advantageous over reactive planning for web agents. However, unlike simulated sandbox environments, real-world environments such as the web are rife with irreversible actions. This undermines the feasibility of backtracking, a cornerstone of (tree) search. Overly relying on test-time search also hurts efficiency. We advocate model-based planning for web agents that employs a world model to simulate and deliberate over the outcome of each candidate action before committing to one. We systematically explore this paradigm by (1) Proposing a model-based planning framework, WebDreamer, which employs LLMs to serve as both world models and value functions; (2) Training specialized LLMs as world models with a scalable data synthesis pipeline. Empirical results demonstrate that WebDreamer achieves substantial performance improvements over reactive baselines. It is competitive, while being 4-5 times more efficient, with tree search in sandbox environments (VisualWebArena) and also works effectively on real-world websites (Online-Mind2Web and Mind2Web-Live). Furthermore, our trained world model, Dreamer-7B, performs comparable to GPT-4o, highlighting the potential of specialized world models for efficient and effective planning in complex web environments.

翻译：基于大型语言模型（LLM）的语言智能体在自动化网络任务方面展现出巨大潜力。近期研究表明，对于网络智能体而言，融入高级规划算法（如树搜索）相比反应式规划更具优势。然而，与模拟沙盒环境不同，现实世界环境（如互联网）充斥着大量不可逆操作。这动摇了回溯（树搜索的基石）的可行性。过度依赖测试时搜索也会损害效率。我们提倡为网络智能体采用基于模型的规划方法，该方法在确定执行某个候选动作前，会利用世界模型来模拟并推演其可能结果。我们通过以下方式系统性地探索这一范式：（1）提出基于模型的规划框架WebDreamer，该框架利用LLM同时作为世界模型和价值函数；（2）通过可扩展的数据合成流程训练专用LLM作为世界模型。实验结果表明，WebDreamer相较于反应式基线方法取得了显著的性能提升。在沙盒环境（VisualWebArena）中，其性能与树搜索相当，同时效率提高4-5倍；在真实网站（Online-Mind2Web与Mind2Web-Live）上也表现出色。此外，我们训练的世界模型Dreamer-7B性能与GPT-4o相当，这凸显了专用世界模型在复杂网络环境中实现高效规划的巨大潜力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日