Training computer use agents requires full-featured OS sandboxes with GUI environments, which consume substantial hardware resources as the number of sandboxes scales. Stochastic errors arising from diverse software execution within these sandboxes further demand robust infrastructure design and reliable error recovery. We present OSGym, a scalable OS environment infrastructure for computer use agents, built around these key optimization strategies: (1) Decentralized OS state management, which isolates failures to individual replicas and significantly enhances overall system reliability; (2) Hardware-aware OS replica orchestration, which addresses CPU-bounded scaling bottlenecks and substantially reduces compute overhead; (3) KVM virtualization with copy-on-write disk management, which shares a common bootable disk across VM instances and provisions only instance-specific modifications, reducing physical disk consumption by 88% and increasing disk provisioning speed by 37 times; and (4) Robust container pool with multi-layer fault recovery. Together, these optimizations yield strong scalability and resource efficiency: OSGym manages over a thousand OS replicas under constrained resources, supports parallel trajectory generation at 1420 multi-turn trajectories per minute, and reduces per-replica cost to 0.2-0.3 USD per day, a 90% reduction over standard deployment. Our experiments validate OSGym across end-to-end pipelines for data collection and training for computer use agents. We believe OSGym establishes a new foundation for scalable, general-purpose computer use agent research.
翻译:训练计算机使用智能体需要配备图形用户界面的全功能操作系统沙箱,随着沙箱数量的扩展,硬件资源消耗将显著增加。多种软件在沙箱中执行产生的随机性错误,进一步要求稳健的基础设施设计与可靠的错误恢复机制。我们提出OSGym——一种面向计算机使用智能体的可扩展操作系统环境基础设施,其核心优化策略包括:(1) 去中心化操作系统状态管理,将故障隔离至单个副本,显著提升系统整体可靠性;(2) 硬件感知的操作系统副本编排,解决以CPU为瓶颈的扩展问题,大幅降低计算开销;(3) 基于写时复制磁盘管理的KVM虚拟化技术,在虚拟机实例间共享可启动磁盘,仅存储实例特有的修改,使物理磁盘消耗降低88%,磁盘供给速度提升37倍;(4) 具备多层故障恢复能力的稳健容器池。这些优化共同实现了强大的可扩展性与资源效率:OSGym能在受限资源下管理超过一千个操作系统副本,支持每分钟生成1420条多轮轨迹的并行轨迹生成,并将每个副本的日运行成本降至0.2-0.3美元,较标准部署降低90%。我们的实验通过数据采集与训练的全流程管线验证了OSGym在计算机使用智能体中的有效性。我们相信OSGym为可扩展的通用计算机使用智能体研究奠定了新基础。