We present AgentJet, a distributed swarm training framework for large language model (LLM) agent reinforcement learning. Unlike centralized frameworks that tightly couple agent rollouts with model optimization, AgentJet adopts a decoupled multi-node architecture in which swarm server nodes host trainable models and run optimization on GPU clusters, whereas swarm client nodes execute arbitrary agents on arbitrary devices. This design provides capabilities that are difficult to support in centralized frameworks: (1) heterogeneous multi-model reinforcement learning, enabling the training of heterogeneous multi-agent teams with multiple LLM as brains; (2) multi-task cocktail training with isolated agent runtimes; (3) fault-tolerant execution that prevents external environment failures from interrupting the training process; and (4) live code iteration, which allows agents to be edited during training by replacing swarm client nodes. To support efficient RL in multi-model, multi-turn, and multi-agent settings, AgentJet introduces a context tracking module with timeline merging, which consolidates redundant context and achieves a 1.5-10x training speedup. Finally, AgentJet introduces an automated research system that takes a research topic as input and autonomously conducts long-horizon, multi-day RL studies on large-scale clusters. By leveraging the swarm architecture, this system reproduces key exploratory workflows of RL researchers without human intervention during execution.
翻译:我们提出AgentJet——一个面向大语言模型智能体强化学习的分布式集群训练框架。与将智能体交互与模型优化紧密耦合的集中式框架不同,AgentJet采用解耦的多节点架构:集群服务器节点承载可训练模型并在GPU集群上执行优化,而集群客户端节点可在任意设备上运行任意智能体。该设计实现了集中式框架难以支持的功能:(1) 异构多模型强化学习,支持以多个大语言模型为大脑的异构多智能体团队训练;(2) 具有隔离智能体运行时的多任务混合训练;(3) 容错执行机制,防止外部环境故障中断训练流程;(4) 实时代码迭代,允许通过替换集群客户端节点在训练过程中编辑智能体。为支持多模型、多轮次及多智能体场景下的高效强化学习,AgentJet引入带时间轴合并的上下文追踪模块,该模块整合冗余上下文并实现1.5-10倍训练加速。最后,AgentJet提出一个自动化研究系统,该系统以研究主题为输入,在大规模集群上自主执行长周期、多日强化学习研究。通过利用集群架构,该系统在无需人工干预的情况下复现了强化学习研究者的关键探索工作流程。