The rapid expansion of Transformer-based large language models has dramatically increased the need for high-performance GPUs. As a result, there is growing demand for fast, accurate, and widely generalizable GPU performance models to support next-generation hardware selection and system-level exploration. However, current data-driven methods are limited, exhibiting poor generalization across hardware and inadequate modeling of complex production-level kernels common in modern inference stacks. To address these issues, we present PipeWeave, a unified GPU modeling framework. This approach first employs an analytical model to quantify a given kernel's demands on the GPU's heterogeneous instruction pipelines. These analytical features are then fed into a machine learning (ML) model to capture complex cross-pipeline interactions and resource dependencies, enabling high-fidelity performance prediction. Our evaluation across 11 GPU types from four generations of major architectures on two widely-used serving systems demonstrates that PipeWeave delivers high fidelity and strong generalizability. It achieves accurate predictions, with only 6.1% average error at the kernel level and 8.5% for end-to-end inference -- reducing the error of state-of-the-art methods by 6.7x and 4.4x, respectively. We also demonstrate PipeWeave's value "beyond simulation" by utilizing its performance ceiling to diagnose implementation shortcomings and guide the optimization of a production fused MoE Triton kernel, achieving up to 1.7x speedup. Code is available https://github.com/zksainx/pipeweave.
翻译:基于Transformer的大语言模型快速扩展极大提升了对高性能GPU的需求。因此,业界亟需快速、准确且具备广泛泛化能力的GPU性能模型,以支持下一代硬件选型与系统级探索。然而,当前数据驱动方法存在局限性:在硬件间泛化能力差,且难以充分建模现代推理栈中常见的高复杂度生产级内核。为解决这些问题,我们提出PipeWeave——一种统一的GPU建模框架。该方法首先通过分析模型量化给定内核在GPU异构指令流水线上的资源需求,随后将这些分析特征输入机器学习模型,以捕捉跨流水线的复杂交互与资源依赖关系,实现高保真性能预测。我们在来自四代主流架构的11种GPU类型及两个广泛使用的服务系统上进行了评估。结果表明,PipeWeave兼具高保真度与强泛化能力:内核级平均误差仅6.1%,端到端推理误差8.5%——相较于现有最先进方法分别降低6.7倍和4.4倍。此外,我们利用PipeWeave的性能上限诊断实现缺陷,并指导生产级融合MoE Triton内核优化,实现高达1.7倍加速,证明了其"超越仿真"的应用价值。代码已开源:https://github.com/zksainx/pipeweave。