AutoWebWorld: Synthesizing Infinite Verifiable Web Environments via Finite State Machines

Yifan Wu,Yiran Peng,Yiyu Chen,Jianhao Ruan,Zijie Zhuang,Cheng Yang,Jiayi Zhang,Man Chen,Yenchi Tseng,Zhaoyang Yu,Liang Chen,Yuyao Zhai,Bang Liu,Chenglin Wu,Yuyu Luo

The performance of autonomous Web GUI agents heavily relies on the quality and quantity of their training data. However, a fundamental bottleneck persists: collecting interaction trajectories from real-world websites is expensive and difficult to verify. The underlying state transitions are hidden, leading to reliance on inconsistent and costly external verifiers to evaluate step-level correctness. To address this, we propose AutoWebWorld, a novel framework for synthesizing controllable and verifiable web environments by modeling them as Finite State Machines (FSMs) and use coding agents to translate FSMs into interactive websites. Unlike real websites, where state transitions are implicit, AutoWebWorld explicitly defines all states, actions, and transition rules. This enables programmatic verification: action correctness is checked against predefined rules, and task success is confirmed by reaching a goal state in the FSM graph. AutoWebWorld enables a fully automated search-and-verify pipeline, generating over 11,663 verified trajectories from 29 diverse web environments at only $0.04 per trajectory. Training on this synthetic data significantly boosts real-world performance. Our 7B Web GUI agent outperforms all baselines within 15 steps on WebVoyager. Furthermore, we observe a clear scaling law: as the synthetic data volume increases, performance on WebVoyager and Online-Mind2Web consistently improves.

翻译：自主Web GUI智能体的性能在很大程度上依赖于其训练数据的质量与数量。然而，一个根本性瓶颈始终存在：从真实网站收集交互轨迹成本高昂且难以验证。底层的状态转移是隐藏的，导致需要依赖不一致且昂贵的外部验证器来评估步骤级正确性。为解决这一问题，我们提出了AutoWebWorld，这是一个通过将Web环境建模为有限状态机（FSM）并利用编码智能体将FSM转化为交互式网站，从而合成可控且可验证的Web环境的新颖框架。与真实网站中状态转移是隐式的不同，AutoWebWorld明确定义了所有状态、动作及转移规则。这使得程序化验证成为可能：动作正确性根据预定义规则进行检查，任务成功则通过到达FSM图中的目标状态来确认。AutoWebWorld实现了一个全自动的搜索-验证流程，仅以每条轨迹0.04美元的成本，从29个多样化的Web环境中生成了超过11,663条已验证轨迹。在此合成数据上进行训练能显著提升真实世界的性能。我们的7B参数Web GUI智能体在WebVoyager基准上于15步内超越了所有基线方法。此外，我们观察到一个清晰的缩放规律：随着合成数据量的增加，在WebVoyager和Online-Mind2Web上的性能持续提升。