Accurately forecasting GPU workloads is essential for AI infrastructure, enabling efficient scheduling, resource allocation, and power management. Modern workloads are highly volatile, multiple periodicity, and heterogeneous, making them challenging for traditional predictors. We propose PRISM, a primitive-based compositional forecasting framework combining dictionary-driven temporal decomposition with adaptive spectral refinement. This dual representation extracts stable, interpretable workload signatures across diverse GPU jobs. Evaluated on large-scale production traces, PRISM achieves state-of-the-art results. It significantly reduces burst-phase errors, providing a robust, architecture-aware foundation for dynamic resource management in GPU-powered AI platforms.
翻译:准确预测GPU工作负载对于AI基础设施至关重要,可实现高效的调度、资源分配和能耗管理。现代工作负载具有高度波动性、多重周期性和异构性,对传统预测方法构成挑战。我们提出PRISM——一种基于基元的组合预测框架,结合了字典驱动的时序分解与自适应频谱细化。这种双重表示能提取多样化GPU作业中稳定且可解释的工作负载特征。基于大规模生产环境的轨迹评估,PRISM取得了最先进的结果。它显著降低了突发阶段误差,为GPU驱动型AI平台的动态资源管理提供了稳健且具备架构感知能力的基础。