Large-scale Mixture of Experts (MoE) Large Language Models (LLMs) have recently become the frontier open-weight models, achieving remarkable model capability similar to proprietary ones. But their random expert selection mechanism introduces significant data movement overhead that becomes the dominant bottleneck in multi-unit LLM serving systems. To understand the patterns underlying this data movement, we conduct comprehensive data-movement-centric profiling across four state-of-the-art large-scale MoE models released in 2025 (200B-1000B) using over 24,000 requests spanning diverse workloads. We perform systematic analysis from both temporal and spatial perspectives and distill six key insights to guide the design of diverse serving systems. We verify these insights on both future wafer-scale GPU architectures and existing GPU systems. On wafer-scale GPUs, lightweight architectural modifications guided by our insights yield a 6.6$\times$ average speedup across four 200B--1000B models. On existing GPU systems, our insights drive the design of a prefill-aware expert placement algorithm that achieves up to 1.25$\times$ speedup on MoE computation. Our work presents the first comprehensive data-centric analysis of large-scale MoE models together with a concrete design study applying the learned lessons. Our profiling traces are publicly available at \href{https://huggingface.co/datasets/core12345/MoE_expert_selection_trace}{\textcolor{blue}{https://huggingface.co/datasets/core12345/MoE\_expert\_selection\_trace}}.
翻译:大规模混合专家(MoE)大语言模型(LLMs)近期已成为前沿开源权重模型,其能力已媲美闭源模型。然而,其随机专家选择机制引入了显著的数据迁移开销,该开销已成为多单元大模型推理系统中的主要瓶颈。为揭示数据迁移背后的规律,我们对2025年发布的四种最先进大规模MoE模型(参数规模200B-1000B)进行了全面数据迁移分析,使用超过24,000个请求覆盖不同工作负载。我们从时间和空间两个维度展开系统分析,提炼出六项关键洞察用于指导多样化的推理系统设计。我们分别在未来晶圆级GPU架构和现有GPU系统上验证了这些洞察。在晶圆级GPU上,基于洞察的轻量级架构改进在四个200B-1000B模型上实现了平均6.6倍加速。在现有GPU系统上,我们的洞察驱动设计了预填充感知的专家放置算法,在MoE计算中实现了高达1.25倍加速。本研究首次对大规模MoE模型进行全面的数据层面分析,并给出应用所发现规律的具体设计案例。我们的分析轨迹已公开发布于 \href{https://huggingface.co/datasets/core12345/MoE_expert_selection_trace}{\textcolor{blue}{https://huggingface.co/datasets/core12345/MoE\_expert\_selection\_trace}}。