We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: sharp reasoning and fast, reliable execution. Step 3.5 Flash pairs a 196B-parameter foundation with 11B active parameters for efficient inference. It is optimized with interleaved 3:1 sliding-window/full attention and Multi-Token Prediction (MTP-3) to reduce the latency and cost of multi-round agentic interactions. To reach frontier-level intelligence, we design a scalable reinforcement learning framework that combines verifiable signals with preference feedback, while remaining stable under large-scale off-policy training, enabling consistent self-improvement across mathematics, code, and tool use. Step 3.5 Flash demonstrates strong performance across agent, coding, and math tasks, achieving 85.4% on IMO-AnswerBench, 86.4% on LiveCodeBench-v6 (2024.08-2025.05), 88.2% on tau2-Bench, 69.0% on BrowseComp (with context management), and 51.0% on Terminal-Bench 2.0, comparable to frontier models such as GPT-5.2 xHigh and Gemini 3.0 Pro. By redefining the efficiency frontier, Step 3.5 Flash provides a high-density foundation for deploying sophisticated agents in real-world industrial environments.
翻译:本文介绍Step 3.5 Flash,一种稀疏混合专家模型,旨在弥合前沿智能体智能与计算效率之间的鸿沟。我们聚焦于构建智能体时最关键的要素:敏锐的推理能力以及快速可靠的执行能力。Step 3.5 Flash采用一个1960亿参数的基础模型,并配备110亿活跃参数以实现高效推理。该模型通过交错使用3:1滑动窗口/全局注意力机制以及多令牌预测技术进行优化,旨在降低多轮智能体交互的延迟与成本。为实现前沿水平的智能,我们设计了一个可扩展的强化学习框架,该框架将可验证信号与偏好反馈相结合,同时能在大规模离线策略训练下保持稳定,从而在数学、代码和工具使用方面实现持续的自我改进。Step 3.5 Flash在智能体、编码和数学任务上均展现出强劲性能:在IMO-AnswerBench上达到85.4%,在LiveCodeBench-v6(2024.08-2025.05)上达到86.4%,在tau2-Bench上达到88.2%,在BrowseComp(具备上下文管理)上达到69.0%,在Terminal-Bench 2.0上达到51.0%,其性能可与GPT-5.2 xHigh和Gemini 3.0 Pro等前沿模型相媲美。通过重新定义效率边界,Step 3.5 Flash为在现实工业环境中部署复杂智能体提供了一个高密度基础。