We introduce Kimi K2, a Mixture-of-Experts (MoE) large language model with 32 billion activated parameters and 1 trillion total parameters. We propose the MuonClip optimizer, which improves upon Muon with a novel QK-clip technique to address training instability while enjoying the advanced token efficiency of Muon. Based on MuonClip, K2 was pre-trained on 15.5 trillion tokens with zero loss spike. During post-training, K2 undergoes a multi-stage post-training process, highlighted by a large-scale agentic data synthesis pipeline and a joint reinforcement learning (RL) stage, where the model improves its capabilities through interactions with real and synthetic environments. Kimi K2 achieves state-of-the-art performance among open-source non-thinking models, with strengths in agentic capabilities. Notably, K2 obtains 66.1 on Tau2-Bench, 76.5 on ACEBench (En), 65.8 on SWE-Bench Verified, and 47.3 on SWE-Bench Multilingual -- surpassing most open and closed-sourced baselines in non-thinking settings. It also exhibits strong capabilities in coding, mathematics, and reasoning tasks, with a score of 53.7 on LiveCodeBench v6, 49.5 on AIME 2025, 75.1 on GPQA-Diamond, and 27.1 on OJBench, all without extended thinking. These results position Kimi K2 as one of the most capable open-source large language models to date, particularly in software engineering and agentic tasks. We release our base and post-trained model checkpoints to facilitate future research and applications of agentic intelligence.
翻译:我们介绍了Kimi K2,这是一个具有320亿激活参数和1万亿总参数的专家混合(MoE)大语言模型。我们提出了MuonClip优化器,它在Muon的基础上进行了改进,采用了一种新颖的QK-clip技术来解决训练不稳定的问题,同时保持了Muon先进的分词效率。基于MuonClip,K2在15.5万亿个token上进行了预训练,且损失值零尖峰。在后训练阶段,K2经历了一个多阶段的后训练过程,其亮点包括一个大规模的智能体数据合成流程和一个联合强化学习(RL)阶段,在此阶段模型通过与真实和合成环境的交互来提升其能力。Kimi K2在开源的非思维模型中实现了最先进的性能,尤其在智能体能力方面表现出色。值得注意的是,K2在Tau2-Bench上获得66.1分,在ACEBench(英文)上获得76.5分,在SWE-Bench Verified上获得65.8分,在SWE-Bench Multilingual上获得47.3分——在非思维设定下超越了大多数开源和闭源的基线模型。它还在编码、数学和推理任务上展现出强大的能力,在LiveCodeBench v6上获得53.7分,在AIME 2025上获得49.5分,在GPQA-Diamond上获得75.1分,在OJBench上获得27.1分,且均未使用扩展思维。这些结果使Kimi K2成为迄今为止能力最强的开源大语言模型之一,特别是在软件工程和智能体任务方面。我们发布了我们的基座模型和后训练模型检查点,以促进未来智能体智能的研究和应用。