We introduce Kimi K2.5, an open-source multimodal agentic model designed to advance general agentic intelligence. K2.5 emphasizes the joint optimization of text and vision so that two modalities enhance each other. This includes a series of techniques such as joint text-vision pre-training, zero-vision SFT, and joint text-vision reinforcement learning. Building on this multimodal foundation, K2.5 introduces Agent Swarm, a self-directed parallel agent orchestration framework that dynamically decomposes complex tasks into heterogeneous sub-problems and executes them concurrently. Extensive evaluations show that Kimi K2.5 achieves state-of-the-art results across various domains including coding, vision, reasoning, and agentic tasks. Agent Swarm also reduces latency by up to $4.5\times$ over single-agent baselines. We release the post-trained Kimi K2.5 model checkpoint to facilitate future research and real-world applications of agentic intelligence.
翻译:我们介绍了Kimi K2.5,一个旨在推进通用智能体智能的开源多模态智能体模型。K2.5强调文本与视觉的联合优化,使两种模态相互增强。这包括一系列技术,如联合文本-视觉预训练、零视觉监督微调(zero-vision SFT)以及联合文本-视觉强化学习。在此多模态基础之上,K2.5引入了Agent Swarm,这是一个自导向的并行智能体编排框架,能够动态地将复杂任务分解为异构子问题并并发执行。广泛的评估表明,Kimi K2.5在包括编程、视觉、推理和智能体任务在内的多个领域取得了最先进的结果。与单智能体基线相比,Agent Swarm还将延迟降低了高达$4.5\times$。我们发布了经过后训练的Kimi K2.5模型检查点,以促进智能体智能的未来研究和实际应用。