Autonomous web agents powered by large language models (LLMs) show strong potential for performing goal-oriented tasks such as information retrieval, report generation, and online transactions. These agents mark a key step toward practical embodied reasoning in open web environments. However, existing approaches remain limited in reasoning depth and efficiency: vanilla linear methods fail at multi-step reasoning and lack effective backtracking, while other search strategies are coarse-grained and computationally costly. We introduce Branch-and-Browse, a fine-grained web agent framework that unifies structured reasoning-acting, contextual memory, and efficient execution. It (i) employs explicit subtask management with tree-structured exploration for controllable multi-branch reasoning, (ii) bootstraps exploration through efficient web state replay with background reasoning, and (iii) leverages a page action memory to share explored actions within and across sessions. On the WebArena benchmark, Branch-and-Browse achieves a task success rate of 35.8\% and reduces execution time by up to 40.4\% relative to state-of-the-art methods. These results demonstrate that Branch-and-Browse is a reliable and efficient framework for LLM-based web agents.
翻译:由大语言模型(LLMs)驱动的自主网络代理在信息检索、报告生成和在线交易等目标导向任务中展现出强大潜力。这些代理标志着在开放网络环境中实现具身推理的关键一步。然而,现有方法在推理深度和效率方面仍存在局限:简单的线性方法无法进行多步推理且缺乏有效回退机制,而其他搜索策略则过于粗粒度且计算成本高昂。我们提出分支与浏览(Branch-and-Browse),一种细粒度网络代理框架,它统一了结构化推理-行动、上下文记忆和高效执行。该框架(i)采用显式子任务管理与树状结构探索实现可控的多分支推理,(ii)通过带有背景推理的高效网络状态重放来引导探索,(iii)利用页面动作记忆实现会话内与会话间探索动作的共享。在WebArena基准测试中,Branch-and-Browse实现了35.8%的任务成功率,并将执行时间相较现有最优方法缩短最多40.4%。这些结果表明,Branch-and-Browse是基于LLM的网络代理的可靠且高效框架。