Existing GPU-sharing techniques, including spatial and temporal sharing, aim to improve utilization but face challenges in simultaneously ensuring SLO adherence and maximizing efficiency due to the lack of fine-grained task scheduling on closed-source GPUs. This paper presents Hummingbird, an SLO-oriented GPU scheduling system that overcomes these challenges by enabling microsecond-scale preemption on closed-source GPUs while effectively harvesting idle GPU time slices. Comprehensive evaluations across diverse GPU architectures reveal that Hummingbird improves the SLO attainment of high-priority tasks by 9.7x and 3.5x compared to the state-of-the-art spatial and temporal-sharing approaches. When compared to executing exclusively, the SLO attainment of the high-priority task, collocating with low-priority tasks on Hummingbird, only drops by less than 1%. Meanwhile, the throughput of the low-priority task outperforms the state-of-the-art temporal-sharing approaches by 2.4x. Hummingbird demonstrates significant effectiveness in ensuring the SLO while enhancing GPU utilization.
翻译:现有的GPU共享技术,包括空间共享与时间共享,旨在提升利用率,但由于闭源GPU上缺乏细粒度任务调度,在同时确保服务等级目标(SLO)遵循与效率最大化方面面临挑战。本文提出蜂鸟(Hummingbird),一种面向SLO的GPU调度系统,它通过在闭源GPU上实现微秒级抢占,并有效收集空闲GPU时间片,从而克服了这些挑战。在不同GPU架构上的综合评估表明,与最先进的空间共享和时间共享方法相比,蜂鸟将高优先级任务的SLO达成率分别提升了9.7倍和3.5倍。与独占执行相比,在蜂鸟上与低优先级任务共置的高优先级任务的SLO达成率下降幅度小于1%。同时,低优先级任务的吞吐量超越了最先进的时间共享方法2.4倍。蜂鸟在确保SLO的同时显著提升了GPU利用率,证明了其卓越的有效性。