Existing work only effective on a given number of GPUs, often neglecting the complexities involved in manually determining the specific types and quantities of GPUs needed, which can be a significant burden for developers. To address this issue, we propose Frenzy, a memory-aware serverless computing method for heterogeneous GPU clusters. Frenzy allows users to submit models without worrying about underlying hardware resources. First, Frenzy predicts the required number and type of GPUs by estimating the GPU memory usage of the LLM. Then, it employs a low-overhead heterogeneity-aware scheduling method to optimize training efficiency. We validated Frenzy's performance by conducting multi-task LLM training tests on a heterogeneous GPU cluster with three different GPU types. The results show that Frenzy's memory usage prediction accuracy exceeds 92\%, the scheduling overhead is reduced by 10 times, and it reduces the average job completion time by 12\% to 18\% compared to state-of-the-art methods.
翻译:现有工作通常仅对给定数量的GPU有效,往往忽略了手动确定所需GPU具体类型和数量所涉及的复杂性,这对开发者而言可能是一项重大负担。为解决此问题,我们提出了Frenzy,一种面向异构GPU集群的内存感知无服务器计算方法。Frenzy允许用户提交模型而无需担忧底层硬件资源。首先,Frenzy通过估算大语言模型的GPU内存使用量来预测所需GPU的数量和类型。随后,它采用一种低开销的异构感知调度方法来优化训练效率。我们在包含三种不同GPU类型的异构GPU集群上进行了多任务大语言模型训练测试,以验证Frenzy的性能。结果表明,Frenzy的内存使用预测准确率超过92%,调度开销降低了10倍,与最先进的方法相比,平均作业完成时间减少了12%至18%。