We introduce LongCat-Flash-Thinking-2601, a 560-billion-parameter open-source Mixture-of-Experts (MoE) reasoning model with superior agentic reasoning capability. LongCat-Flash-Thinking-2601 achieves state-of-the-art performance among open-source models on a wide range of agentic benchmarks, including agentic search, agentic tool use, and tool-integrated reasoning. Beyond benchmark performance, the model demonstrates strong generalization to complex tool interactions and robust behavior under noisy real-world environments. Its advanced capability stems from a unified training framework that combines domain-parallel expert training with subsequent fusion, together with an end-to-end co-design of data construction, environments, algorithms, and infrastructure spanning from pre-training to post-training. In particular, the model's strong generalization capability in complex tool-use are driven by our in-depth exploration of environment scaling and principled task construction. To optimize long-tailed, skewed generation and multi-turn agentic interactions, and to enable stable training across over 10,000 environments spanning more than 20 domains, we systematically extend our asynchronous reinforcement learning framework, DORA, for stable and efficient large-scale multi-environment training. Furthermore, recognizing that real-world tasks are inherently noisy, we conduct a systematic analysis and decomposition of real-world noise patterns, and design targeted training procedures to explicitly incorporate such imperfections into the training process, resulting in improved robustness for real-world applications. To further enhance performance on complex reasoning tasks, we introduce a Heavy Thinking mode that enables effective test-time scaling by jointly expanding reasoning depth and width through intensive parallel thinking.
翻译:我们介绍了 LongCat-Flash-Thinking-2601,这是一个拥有 5600 亿参数的开源混合专家推理模型,具备卓越的智能体推理能力。LongCat-Flash-Thinking-2601 在广泛的智能体基准测试中,包括智能体搜索、智能体工具使用以及工具集成推理,均取得了开源模型中的最先进性能。除了基准测试表现,该模型还展现出对复杂工具交互的强大泛化能力,以及在嘈杂现实环境下的鲁棒行为。其先进能力源于一个统一的训练框架,该框架结合了领域并行专家训练与后续融合,以及一个贯穿从预训练到后训练的数据构建、环境、算法和基础设施的端到端协同设计。特别是,该模型在复杂工具使用方面强大的泛化能力,源于我们对环境扩展和原则性任务构建的深入探索。为了优化长尾、偏态生成和多轮智能体交互,并实现跨越 20 多个领域、超过 10,000 个环境的稳定训练,我们系统地扩展了我们的异步强化学习框架 DORA,以实现稳定高效的大规模多环境训练。此外,认识到现实世界任务本质上是嘈杂的,我们对现实世界的噪声模式进行了系统性的分析和分解,并设计了有针对性的训练程序,明确地将此类不完美因素纳入训练过程,从而提高了现实世界应用的鲁棒性。为了进一步提升复杂推理任务的性能,我们引入了一种“深度思考”模式,该模式通过密集并行思考联合扩展推理深度和宽度,实现了有效的测试时扩展。