Training computer use agents requires full-featured OS sandboxes with GUI environments, which consume substantial hardware resources as the number of sandboxes scales. Stochastic errors arising from diverse software execution within these sandboxes further demand robust infrastructure design and reliable error recovery. We present OSGym, a scalable OS environment infrastructure for computer use agents, built around these key optimization strategies: (1) Decentralized OS state management, which isolates failures to individual replicas and significantly enhances overall system reliability; (2) Hardware-aware OS replica orchestration, which addresses CPU-bounded scaling bottlenecks and substantially reduces compute overhead; (3) KVM virtualization with copy-on-write disk management, which shares a common bootable disk across VM instances and provisions only instance-specific modifications, reducing physical disk consumption by 88% and increasing disk provisioning speed by 37 times; and (4) Robust container pool with multi-layer fault recovery. Together, these optimizations yield strong scalability and resource efficiency: OSGym manages over a thousand OS replicas under constrained resources, supports parallel trajectory generation at 1420 multi-turn trajectories per minute, and reduces per-replica cost to 0.2-0.3 USD per day, a 90% reduction over standard deployment. Our experiments validate OSGym across end-to-end pipelines for data collection and training for computer use agents. We believe OSGym establishes a new foundation for scalable, general-purpose computer use agent research.
翻译:训练计算机使用智能体需要具备图形用户界面的全功能操作系统沙盒,随着沙盒数量扩展,这些环境会消耗大量硬件资源。不同软件在沙盒内执行时产生的随机错误进一步要求稳健的基础设施设计与可靠的错误恢复。我们提出OSGym——一套面向计算机使用智能体的可扩展操作系统环境基础设施,其核心优化策略包括:(1) 去中心化操作系统状态管理,将故障隔离在单个副本中,显著提升系统整体可靠性;(2) 硬件感知的操作系统副本编排,解决CPU密集型扩展瓶颈,大幅降低计算开销;(3) 基于KVM虚拟化与写时复制磁盘管理,在虚拟机实例间共享通用可启动磁盘,仅分配实例特有修改,将物理磁盘消耗降低88%,磁盘供给速度提升37倍;(4) 具备多层故障恢复机制的稳健容器池。这些优化共同实现了强大的可扩展性与资源效率:OSGym在资源受限条件下管理超过一千个操作系统副本,支持每分钟生成1420条多轮轨迹的并行轨迹生成,并将每副本成本降至0.2-0.3美元/天,较标准部署降低90%。我们的实验通过端到端数据采集与训练流程验证了OSGym在计算机使用智能体场景中的有效性。我们相信OSGym为可扩展的通用计算机使用智能体研究奠定了新基础。