We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: sharp reasoning and fast, reliable execution. Step 3.5 Flash pairs a 196B-parameter foundation with 11B active parameters for efficient inference. It is optimized with interleaved 3:1 sliding-window/full attention and Multi-Token Prediction (MTP-3) to reduce the latency and cost of multi-round agentic interactions. To reach frontier-level intelligence, we design a scalable reinforcement learning framework that combines verifiable signals with preference feedback, while remaining stable under large-scale off-policy training, enabling consistent self-improvement across mathematics, code, and tool use. Step 3.5 Flash demonstrates strong performance across agent, coding, and math tasks, achieving 85.4% on IMO-AnswerBench, 86.4% on LiveCodeBench-v6 (2024.08-2025.05), 88.2% on tau2-Bench, 69.0% on BrowseComp (with context management), and 51.0% on Terminal-Bench 2.0, comparable to frontier models such as GPT-5.2 xHigh and Gemini 3.0 Pro. By redefining the efficiency frontier, Step 3.5 Flash provides a high-density foundation for deploying sophisticated agents in real-world industrial environments.
翻译:我们推出Step 3.5 Flash,一个稀疏的专家混合模型,旨在弥合前沿级智能体能力与计算效率之间的鸿沟。我们聚焦于构建智能体时最关键的要素:敏锐的推理能力以及快速、可靠的执行能力。Step 3.5 Flash 采用一个1960亿参数的基础模型,并通过110亿激活参数实现高效推理。该模型通过交替使用的3:1滑动窗口/全局注意力机制以及多令牌预测技术进行优化,旨在降低多轮智能体交互的延迟与成本。为实现前沿级智能,我们设计了一个可扩展的强化学习框架,该框架将可验证的信号与偏好反馈相结合,同时能在大规模离线策略训练下保持稳定,从而在数学、代码和工具使用方面实现持续的自我改进。Step 3.5 Flash 在智能体、编码和数学任务上均展现出强劲性能,在IMO-AnswerBench上达到85.4%,在LiveCodeBench-v6 (2024.08-2025.05)上达到86.4%,在tau2-Bench上达到88.2%,在BrowseComp(具备上下文管理)上达到69.0%,在Terminal-Bench 2.0上达到51.0%,其表现可与GPT-5.2 xHigh和Gemini 3.0 Pro等前沿模型相媲美。通过重新定义效率边界,Step 3.5 Flash为在现实工业环境中部署复杂的智能体提供了一个高密度的基础。