Large Language Models (LLMs) have empowered autonomous agents to handle complex web navigation tasks. While recent studies integrate tree search to enhance long-horizon reasoning, applying these algorithms in web navigation faces two critical challenges: sparse valid paths that lead to inefficient exploration, and a noisy context that dilutes accurate state perception. To address this, we introduce Plan-MCTS, a framework that reformulates web navigation by shifting exploration to a semantic Plan Space. By decoupling strategic planning from execution grounding, it transforms sparse action space into a Dense Plan Tree for efficient exploration, and distills noisy contexts into an Abstracted Semantic History for precise state awareness. To ensure efficiency and robustness, Plan-MCTS incorporates a Dual-Gating Reward to strictly validate both physical executability and strategic alignment and Structural Refinement for on-policy repair of failed subplans. Extensive experiments on WebArena demonstrate that Plan-MCTS achieves state-of-the-art performance, surpassing current approaches with higher task effectiveness and search efficiency.
翻译:大语言模型(LLM)已赋能自主代理处理复杂的网页导航任务。尽管近期研究通过集成树搜索来增强长程推理能力,但这些算法在网页导航中的应用面临两大关键挑战:稀疏的有效路径导致探索效率低下,以及嘈杂的上下文削弱了准确的状态感知。为解决这些问题,我们提出了Plan-MCTS框架,该框架通过将探索转移到语义规划空间来重构网页导航问题。通过将策略规划与执行落地解耦,该方法将稀疏的行动空间转化为稠密的规划树以实现高效探索,并将嘈杂的上下文提炼为抽象的语义历史以保障精确的状态感知。为确保效率与鲁棒性,Plan-MCTS引入了双重门控奖励机制,以严格验证物理可执行性与策略一致性,并采用结构优化方法对失败子规划进行在线策略修复。在WebArena上的大量实验表明,Plan-MCTS实现了最先进的性能,以更高的任务完成率和搜索效率超越了现有方法。