GPU hardware is vastly underutilized. Even resource-intensive AI applications have diverse resource profiles that often leave parts of GPUs idle. While colocating applications can improve utilization, current spatial sharing systems lack performance guarantees. Providing predictable performance guarantees requires a deep understanding of how applications contend for shared GPU resources such as block schedulers, compute units, L1/L2 caches, and memory bandwidth. We propose a methodology to profile resource interference of GPU kernels across these dimensions and discuss how to build GPU schedulers that provide strict performance guarantees while colocating applications to minimize cost.
翻译:GPU硬件存在严重的利用不足问题。即使是资源密集型的AI应用,其资源需求模式也存在差异,常常导致GPU部分资源处于闲置状态。虽然通过应用共置可以提高利用率,但现有的空间共享系统缺乏性能保证。要提供可预测的性能保证,需要深入理解应用如何竞争共享的GPU资源,包括块调度器、计算单元、L1/L2缓存和内存带宽。我们提出了一种跨这些维度分析GPU内核资源干扰的方法,并讨论了如何构建GPU调度器,使其在通过应用共置降低成本的同时,提供严格的性能保证。