We present Ring-1T, the first open-source, state-of-the-art thinking model with a trillion-scale parameter. It features 1 trillion total parameters and activates approximately 50 billion per token. Training such models at a trillion-parameter scale introduces unprecedented challenges, including train-inference misalignment, inefficiencies in rollout processing, and bottlenecks in the RL system. To address these, we pioneer three interconnected innovations: (1) IcePop stabilizes RL training via token-level discrepancy masking and clipping, resolving instability from training-inference mismatches; (2) C3PO++ improves resource utilization for long rollouts under a token budget by dynamically partitioning them, thereby obtaining high time efficiency; and (3) ASystem, a high-performance RL framework designed to overcome the systemic bottlenecks that impede trillion-parameter model training. Ring-1T delivers breakthrough results across critical benchmarks: 93.4 on AIME-2025, 86.72 on HMMT-2025, 2088 on CodeForces, and 55.94 on ARC-AGI-v1. Notably, it attains a silver medal-level result on the IMO-2025, underscoring its exceptional reasoning capabilities. By releasing the complete 1T parameter MoE model to the community, we provide the research community with direct access to cutting-edge reasoning capabilities. This contribution marks a significant milestone in democratizing large-scale reasoning intelligence and establishes a new baseline for open-source model performance.
翻译:我们提出了Ring-1T,这是首个开源的、具有万亿级参数的最先进思维模型。其总参数量达到1万亿,每个token激活约500亿参数。在万亿参数规模上训练此类模型带来了前所未有的挑战,包括训练-推理失准、推演处理效率低下以及RL系统瓶颈。为解决这些问题,我们开创了三个相互关联的创新:(1) IcePop通过token级差异掩码与裁剪来稳定RL训练,解决了训练-推理不匹配导致的不稳定性;(2) C3PO++在token预算约束下,通过动态划分长推演序列来提高资源利用率,从而获得高时间效率;(3) ASystem是一个专为克服阻碍万亿参数模型训练的系统性瓶颈而设计的高性能RL框架。Ring-1T在关键基准测试中取得了突破性成果:AIME-2025上达到93.4分,HMMT-2025上达到86.72分,CodeForces上达到2088分,ARC-AGI-v1上达到55.94分。值得注意的是,它在IMO-2025上达到了银牌级别的成绩,突显了其卓越的推理能力。通过向社区发布完整的1T参数MoE模型,我们为研究界提供了直接访问尖端推理能力的途径。这一贡献标志着大规模推理智能民主化的重要里程碑,并为开源模型性能确立了新的基准。