GPU hardware is vastly underutilized. Even resource-intensive AI applications have diverse resource profiles that often leave parts of GPUs idle. While colocating applications can improve utilization, current spatial sharing systems lack performance guarantees. Providing predictable performance guarantees requires a deep understanding of how applications contend for shared GPU resources such as block schedulers, compute units, L1/L2 caches, and memory bandwidth. We propose a methodology to profile resource interference of GPU kernels across these dimensions and discuss how to build GPU schedulers that provide strict performance guarantees while colocating applications to minimize cost.
翻译:GPU硬件利用率严重不足。即使是资源密集型的AI应用,其资源需求模式也各不相同,常常导致GPU部分资源闲置。虽然通过应用共置可以提高利用率,但现有的空间共享系统缺乏性能保障。要提供可预测的性能保证,需要深入理解应用如何竞争共享GPU资源,例如块调度器、计算单元、L1/L2缓存和内存带宽。我们提出了一种跨维度分析GPU内核资源干扰的方法,并探讨如何构建能够提供严格性能保证的GPU调度器,在通过应用共置降低成本的同时确保性能。