Post-training GUI agents in interactive environments is critical for developing generalization and long-horizon planning capabilities. However, training on real-world applications is hindered by high latency, poor reproducibility, and unverifiable rewards relying on noisy visual proxies. To address the limitations, we present GUI-GENESIS, the first framework to automatically synthesize efficient GUI training environments with verifiable rewards. GUI-GENESIS reconstructs real-world applications into lightweight web environments using multimodal code models and equips them with code-native rewards, executable assertions that provide deterministic reward signals and eliminate visual estimation noise. Extensive experiments show that GUI-GENESIS reduces environment latency by 10 times and costs by over $28,000 per epoch compared to training on real applications. Notably, agents trained with GUI-GENESIS outperform the base model by 14.54% and even real-world RL baselines by 3.27% on held-out real-world tasks. Finally, we observe that models can synthesize environments they cannot yet solve, highlighting a pathway for self-improving agents.
翻译:在交互式环境中对GUI智能体进行后训练,对于发展其泛化能力与长程规划能力至关重要。然而,在真实世界应用程序上进行训练面临着高延迟、低可复现性,以及依赖噪声视觉代理的不可验证奖励等障碍。为解决这些局限,我们提出了GUI-GENESIS,这是首个能够自动合成具备可验证奖励的高效GUI训练环境的框架。GUI-GENESIS利用多模态代码模型将真实世界应用程序重构为轻量级Web环境,并为其配备代码原生奖励——这是一种可执行的断言,能提供确定性的奖励信号并消除视觉估计噪声。大量实验表明,与在真实应用程序上训练相比,GUI-GENESIS将环境延迟降低了10倍,并将每个训练周期的成本降低了超过28,000美元。值得注意的是,使用GUI-GENESIS训练的智能体在预留的真实世界任务上,其性能比基础模型高出14.54%,甚至比真实世界强化学习基线高出3.27%。最后,我们观察到模型能够合成其当前尚无法解决的环境,这为智能体的自我改进指明了一条路径。