GUI agents that interact with graphical interfaces on behalf of users represent a promising direction for practical AI assistants. However, training such agents is hindered by the scarcity of suitable environments. We present InfiniteWeb, a system that automatically generates functional web environments at scale for GUI agent training. While LLMs perform well on generating a single webpage, building a realistic and functional website with many interconnected pages faces challenges. We address these challenges through unified specification, task-centric test-driven development, and a combination of website seed with reference design image to ensure diversity. Our system also generates verifiable task evaluators enabling dense reward signals for reinforcement learning. Experiments show that InfiniteWeb surpasses commercial coding agents at realistic website construction, and GUI agents trained on our generated environments achieve significant performance improvements on OSWorld and Online-Mind2Web, demonstrating the effectiveness of proposed system.
翻译:代表用户与图形界面交互的GUI智能体是实用人工智能助手的一个前景广阔的方向。然而,此类智能体的训练因缺乏合适的环境而受到阻碍。我们提出了InfiniteWeb,这是一个为GUI智能体训练大规模自动生成功能性网络环境的系统。尽管大语言模型在生成单个网页方面表现良好,但构建一个包含多个互连页面的、真实且功能完整的网站仍面临挑战。我们通过统一规范、以任务为中心的测试驱动开发,以及结合网站种子与参考设计图像以确保多样性,来解决这些挑战。我们的系统还能生成可验证的任务评估器,从而为强化学习提供密集的奖励信号。实验表明,InfiniteWeb在构建真实网站方面超越了商业编码智能体,并且在我们生成的环境上训练的GUI智能体在OSWorld和Online-Mind2Web基准测试中取得了显著的性能提升,证明了所提出系统的有效性。