Large-scale Mixture of Experts (MoE) Large Language Models (LLMs) have recently become the frontier open weight models, achieving remarkable model capability similar to proprietary ones. But their random expert selection mechanism introduces significant data movement overhead that becomes the dominant bottleneck in multi-unit LLM serving systems. To understand the patterns underlying this data movement, we conduct comprehensive data-movement-centric profiling across four state-of-the-art large-scale MoE models released in 2025 (200B-1000B) using over 24,000 requests spanning diverse workloads. We perform systematic analysis from both temporal and spatial perspectives and distill six key insights to guide the design of diverse serving systems. We verify these insights on both future wafer-scale GPU architectures and existing GPU systems. On wafer-scale GPUs, lightweight architectural modifications guided by our insights yield a 6.6$\times$ average speedup across four 200B--1000B models. On existing GPU systems, our insights drive the design of a prefill-aware expert placement algorithm that achieves up to 1.25$\times$ speedup on MoE computation. Our work presents the first comprehensive data-centric analysis of large-scale MoE models together with a concrete design study applying the learned lessons. Our profiling traces are publicly available at \href{https://huggingface.co/datasets/core12345/MoE_expert_selection_trace}{\textcolor{blue}{https://huggingface.co/datasets/core12345/MoE\_expert\_selection\_trace}}.
翻译:大规模混合专家(MoE)大语言模型(LLM)近期成为开源权重模型的标杆,其模型能力已接近专有模型。然而,其随机专家选择机制引入显著的数据迁移开销,成为多单元LLM服务系统中的关键瓶颈。为揭示数据迁移的潜在模式,我们基于覆盖多样化工作负载的24000余次请求,对2025年发布的四款最先进大规模MoE模型(200B-1000B参数)开展了全面的数据迁移特性分析。通过从时间与空间两个维度进行系统剖析,我们提炼出六项关键洞察以指导各类服务系统设计,并在未来的晶圆级GPU架构与现有GPU系统中验证了这些见解。在晶圆级GPU上,基于这些洞察的轻量级架构优化使四款200B-1000B模型的平均加速比达到6.6倍;在现有GPU系统中,其驱动设计的预填充感知专家放置算法可实现最高1.25倍的MoE计算加速。本工作首次对大规模MoE模型进行了以数据为中心的综合分析,并基于所得规律开展了具体设计实践。我们的性能追踪数据已开源至 \href{https://huggingface.co/datasets/core12345/MoE_expert_selection_trace}{\textcolor{blue}{https://huggingface.co/datasets/core12345/MoE\_expert\_selection\_trace}}。