GUI agents that interact with graphical interfaces on behalf of users represent a promising direction for practical AI assistants. However, training such agents is hindered by the scarcity of suitable environments. We present InfiniteWeb, a system that automatically generates functional web environments at scale for GUI agent training. While LLMs perform well on generating a single webpage, building a realistic and functional website with many interconnected pages faces challenges. We address these challenges through unified specification, task-centric test-driven development, and a combination of website seed with reference design image to ensure diversity. Our system also generates verifiable task evaluators enabling dense reward signals for reinforcement learning. Experiments show that InfiniteWeb surpasses commercial coding agents at realistic website construction, and GUI agents trained on our generated environments achieve significant performance improvements on OSWorld and Online-Mind2Web, demonstrating the effectiveness of proposed system.
翻译:GUI智能体代表用户与图形界面交互,是实用人工智能助手的重要发展方向。然而,此类智能体的训练受限于合适环境的稀缺性。本文提出InfiniteWeb系统,该系统能够大规模自动生成用于GUI智能体训练的功能性Web环境。尽管大语言模型在生成单个网页方面表现良好,但构建具有多页面互连的现实功能性网站仍面临挑战。我们通过统一规范、以任务为中心的测试驱动开发、以及结合网站种子与参考设计图像来确保多样性,从而应对这些挑战。本系统还能生成可验证的任务评估器,为强化学习提供密集奖励信号。实验表明,InfiniteWeb在构建现实网站方面优于商业编程智能体,且基于生成环境训练的GUI智能体在OSWorld和Online-Mind2Web基准上取得显著性能提升,证明了所提系统的有效性。