We introduce OSGym, a scalable distributed Data Engine for training agents across diverse computer use tasks. OSGym efficiently scales to more than a thousand operating system (OS) replicas under academia-affordable cost budget, to serve as agent runtime environments. OSGym has three advantages: 1) Scalability: Despite intensive resource consumption for running OS replicas, OSGym can parallelize a thousand OS replicas while maintaining the operation efficiency under limited resources. Its scalable parallelization enables generating a vast amount of data (1420 multi-turn trajectories per minute). 2) Generality and Customizability: OSGym supports a wide variety of tasks as long as they run on operating systems, including functional tool-use, browser interactions, software engineering, office applications, etc. It also enables easy and flexible customization of model training algorithms. 3) Economic Viability for Academia Use: Only costs 0.2 to 0.3 USD per day per OS replica on easily accessible on-demand compute providers. Our experiments demonstrate the effectiveness of OSGym for implementing comprehensive pipelines for data collection, supervised fine-tuning, and reinforcement learning for computer use agents. We believe OSGym will push the scalability and universality in future agent research.
翻译:本文介绍OSGym,一个用于跨多样化计算机使用任务训练智能体的可扩展分布式数据引擎。OSGym能在学术界可负担的成本预算下,高效扩展至上千个操作系统副本,作为智能体的运行时环境。OSGym具备三大优势:1) 可扩展性:尽管运行操作系统副本需要密集资源消耗,OSGym能在有限资源下并行处理上千个操作系统副本,同时保持运行效率。其可扩展并行化能力支持生成海量数据(每分钟1420条多轮轨迹)。2) 通用性与可定制性:OSGym支持任何在操作系统上运行的任务,包括功能性工具使用、浏览器交互、软件工程、办公应用等。同时允许对模型训练算法进行灵活便捷的定制。3) 学术应用的经济可行性:在易于获取的按需计算服务商上,每个操作系统副本每日成本仅为0.2至0.3美元。实验证明OSGym能有效实现计算机使用智能体的数据收集、监督微调与强化学习的完整流程。我们相信OSGym将推动未来智能体研究的可扩展性与普适性发展。