Advances in GPU compute throughput and memory capacity brings significant opportunities to a wide range of workloads. However, efficiently utilizing these resources remains challenging, particularly because diverse application characteristics may result in imbalanced utilization. Multi-Instance GPU (MIG) is a promising approach to improve utilization by partitioning GPU compute and memory resources into fixed-size slices with isolation. Yet, its effectiveness and limitations in supporting HPC workloads remain an open question. We present a comprehensive system-level characterization of different GPU sharing options using real-world scientific, AI, and data analytics applications, including NekRS, LAMMPS, Llama3, and Qiskit. Our analysis reveals that while GPU sharing via MIG can significantly reduce resource underutilization, and enable system-level improvements in throughput and energy, interference still occurs through shared resources, such as power throttling. Our performance-resource scaling results indicate that coarse-grained provisioning for tightly coupled compute and memory resources often mismatches application needs. To address this mismatch, we propose a memory-offloading scheme that leverages the cache-coherent Nvlink-C2C interconnect to bridge the gap between coarse-grained resource slices and reduce resource underutilization.
翻译:GPU计算吞吐量与内存容量的进步为各类工作负载带来了重大机遇。然而,高效利用这些资源仍具挑战性,尤其当多样化的应用特征可能导致资源利用失衡时。多实例GPU(MIG)作为一种颇具前景的方案,通过将GPU计算与内存资源划分为固定大小且具备隔离性的分片来提升利用率。但其对HPC工作负载的支持效果与局限性仍是悬而未决的问题。我们基于真实科学计算、人工智能及数据分析应用(包括NekRS、LAMMPS、Llama3和Qiskit),对不同的GPU共享方案进行了全面的系统级表征。分析表明:尽管通过MIG共享GPU可显著降低资源利用不足,并实现系统级吞吐量与能效提升,但共享资源(如功率限制)仍会引发干扰。性能-资源缩放实验结果表明,针对紧耦合计算与内存资源的粗粒度预配置方案往往与应用需求不匹配。为弥合这一偏差,我们提出一种内存卸载方案,利用缓存一致性Nvlink-C2C互连桥接粗粒度资源分片与细粒度需求之间的差距,从而降低资源利用不足。