Recent years have witnessed increasing interest in machine learning inferences on serverless computing for its auto-scaling and cost effective properties. Existing serverless computing, however, lacks effective job scheduling methods to handle the schedule space dramatically expanded by GPU sharing, task batching, and inter-task relations. Prior solutions have dodged the issue by neglecting some important factors, leaving some large performance potential locked. This paper presents ESG, a new scheduling algorithm that directly addresses the difficulties. ESG treats sharable GPU as a first-order factor in scheduling. It employs an optimality-guided adaptive method by combining A*-search and a novel dual-blade pruning to dramatically prune the scheduling space without compromising the quality. It further introduces a novel method, dominator-based SLO distribution, to ensure the scalability of the scheduler. The results show that ESG can significantly improve the SLO hit rates 61%-80% while saving 47%-187% costs over prior work.
翻译:近年来,机器学习推理在无服务器计算领域的应用因其自动扩缩容和成本效益特性而日益受到关注。然而,现有无服务器计算缺乏有效的作业调度方法来处理由GPU共享、任务批处理以及任务间关联性导致的调度空间急剧膨胀问题。现有方案通过忽略某些重要因素规避该问题,导致大量性能潜力被锁死。本文提出ESG——一种直接应对上述挑战的新型调度算法。ESG将可共享GPU作为调度中的一阶因素,通过结合A*搜索与新型双刃剪枝方法的最优性引导自适应机制,在不降低调度质量的前提下大幅压缩调度空间。该算法进一步引入基于支配者的SLO分布这一创新方法,确保调度器的可扩展性。实验结果表明,与现有工作相比,ESG可将SLO命中率显著提升61%-80%,同时节省47%-187%的成本。