We introduce OpenTinker, an infrastructure for reinforcement learning (RL) of large language model (LLM) agents built around a separation of concerns across algorithm design, execution, and agent-environment interaction. Rather than relying on monolithic, end-to-end RL pipelines, OpenTinker decomposes agentic learning systems into lightweight, composable components with clearly defined abstraction boundaries. Users specify agents, environments, and interaction protocols, while inference and training are delegated to a managed execution runtime. OpenTinker introduces a centralized scheduler for managing training and inference workloads, including LoRA-based and full-parameter RL, supervised fine-tuning, and inference, over shared resources. We further discuss design principles for extending OpenTinker to multi-agent training. Finally, we present a set of RL use cases that demonstrate the effectiveness of the framework in practical agentic learning scenarios.
翻译:本文介绍OpenTinker,这是一个围绕算法设计、执行以及智能体-环境交互的关注点分离而构建的大型语言模型智能体强化学习基础设施。与依赖单一、端到端的强化学习流水线不同,OpenTinker将智能体学习系统分解为具有明确定义抽象边界的轻量级、可组合组件。用户指定智能体、环境和交互协议,而推理和训练任务则委托给一个托管执行运行时。OpenTinker引入了一个集中式调度器,用于在共享资源上管理训练和推理工作负载,包括基于LoRA和全参数的强化学习、监督微调以及推理。我们进一步讨论了将OpenTinker扩展到多智能体训练的设计原则。最后,我们展示了一系列强化学习用例,以证明该框架在实际智能体学习场景中的有效性。