Despite advances in multimodal large language models, autonomous web agents still struggle to reliably execute long-horizon tasks on complex and dynamic web interfaces. Existing agents often suffer from inaccurate element grounding, the absence of site-specific procedural knowledge, and unstable long-term task tracking and memory, particularly when operating over complex Document Object Model structures. To address these limitations, we introduce Avenir-Web, a web agent that achieves a new open-source state of the art on the Online-Mind2Web benchmark in real-world deployment. Avenir-Web leverages a Mixture of Grounding Experts, Experience-Imitation Planning for incorporating procedural priors, and a task-tracking checklist combined with adaptive memory to enable robust and seamless interaction across diverse user interface paradigms. We evaluate Avenir-Web on Online-Mind2Web, a rigorous benchmark of live and user-centered web tasks. Our results demonstrate that Avenir-Web significantly surpasses prior open-source agents and attains performance parity with top-tier proprietary models, thereby establishing a new open-source state of the art for reliable web agents on live websites.
翻译:尽管多模态大语言模型取得了进展,自主网络智能体在复杂动态的网络界面上可靠地执行长程任务时仍面临困难。现有智能体常受限于不精确的元素定位、缺乏站点特定的流程知识,以及不稳定的长期任务追踪与记忆,尤其是在处理复杂的文档对象模型结构时。为应对这些局限,我们提出了Avenir-Web,一种在真实世界部署中于Online-Mind2Web基准测试上达到开源新最佳水平的网络智能体。Avenir-Web利用了混合定位专家、用于融入流程先验的经验模仿规划,以及结合自适应记忆的任务追踪清单,以实现跨多样化用户界面范式的鲁棒无缝交互。我们在Online-Mind2Web这一对实时、以用户为中心的网络任务进行严格评估的基准上对Avenir-Web进行了测试。结果表明,Avenir-Web显著超越了先前的开源智能体,并与顶级专有模型达到了性能相当的水平,从而为实时网站上的可靠网络智能体确立了新的开源最佳水平。