Multi-turn LLM agents are increasingly important for solving complex, interactive tasks, and reinforcement learning (RL) is a key ingredient for improving their long-horizon behavior. However, RL training requires generating large numbers of sandboxed rollout trajectories, and existing infrastructures often couple rollout orchestration with the training loop, making systems hard to migrate and maintain. Under the rollout-as-a-service philosophy, we present ProRL Agent , a scalable infrastructure that serves the full agentic rollout lifecycle through an API service. ProRL Agent also provides standardized and extensible sandbox environments that support diverse agentic tasks in rootless HPC settings. We validate ProRL Agent through RL training on software engineering, math, STEM, and coding tasks. ProRL Agent is open-sourced and integrated as part of NVIDIA NeMo Gym.
翻译:多轮LLM智能体在解决复杂交互任务中日益重要,强化学习是改善其长程行为的关键要素。然而,RL训练需要生成大量沙盒推演轨迹,现有基础设施通常将推演编排与训练循环紧密耦合,导致系统难以迁移与维护。基于推演即服务的理念,我们提出ProRL Agent——一种可扩展的基础设施,通过API服务承载完整的智能体推演生命周期。ProRL Agent同时提供标准化且可扩展的沙盒环境,支持无根HPC设置下的多样化智能体任务。我们通过在软件工程、数学、STEM及编程任务上的RL训练验证了ProRL Agent的有效性。该框架已开源并集成至NVIDIA NeMo Gym中。