Mi:dm K 2.5 Pro - 专知论文

The evolving LLM landscape requires capabilities beyond simple text generation, prioritizing multi-step reasoning, long-context understanding, and agentic workflows. This shift challenges existing models in enterprise environments, especially in Korean-language and domain-specific scenarios where scaling is insufficient. We introduce Mi:dm K 2.5 Pro, a 32B parameter flagship LLM designed to address enterprise-grade complexity through reasoning-focused optimization. Our methodology builds a robust data foundation via a quality-centric curation pipeline utilizing abstract syntax tree (AST) analysis for code, gap-filling synthesis for mathematics, and an LLM-based quality evaluator. Pre-training scales the model via layer-predictor-based Depth Upscaling (DuS) and a progressive strategy supporting a 128K token context window. Post-training introduces a specialized multi-stage pipeline, including Reasoning SFT, model merging, and asynchronous reinforcement learning (RL), to develop complex problem-solving skills. "Fusion Training" then rebalances these capabilities with conversational fluency, consistent response styling, and reliable tool-use. The evaluations show that Mi:dm K 2.5 Pro achieves competitive performance against leading global and domestic models. In addition, it sets state-of-the-art results on Korean-specific benchmarks, showcasing deep linguistic and cultural understanding. Finally, Responsible AI evaluations validate safety against attacks, ensuring a secure profile for deployment with a balance of harmlessness and responsiveness.

翻译：不断演进的大语言模型领域要求模型具备超越简单文本生成的能力，优先强调多步推理、长上下文理解和代理式工作流。这一转向对现有模型在企业环境中的表现构成了挑战，尤其是在仅靠规模扩展无法满足需求的韩语及特定领域场景中。我们推出Mi:dm K 2.5 Pro，一款拥有320亿参数的旗舰级大语言模型，旨在通过以推理为中心的优化应对企业级复杂任务。我们的方法论通过质量驱动的数据筛选管线构建了稳健的数据基础：利用抽象语法树分析处理代码、采用填空式合成方法处理数学问题、并引入基于大语言模型的质量评估器。在预训练阶段，我们通过基于层级预测器的深度扩展技术及支持128K词元上下文的渐进式策略对模型进行扩展。后训练阶段引入专门化的多阶段流程，包括推理监督微调、模型融合与异步强化学习，以培养复杂问题解决能力。通过"融合训练"进一步平衡了模型在对话流畅性、响应风格一致性及可靠工具使用等能力。评估结果显示，Mi:dm K 2.5 Pro在与全球及国内领先模型的竞争中取得了具有竞争力的表现。此外，它在韩语特定基准测试中刷新了最优成绩，展现了深厚的语言与文化理解能力。最后，负责任的人工智能评估验证了模型抵御攻击的安全性，确保其在无害性与响应能力之间取得平衡的部署安全特性。