Large language models such as GPT-3 & ChatGPT have transformed deep learning (DL), powering applications that have captured the public's imagination. These models are rapidly being adopted across domains for analytics on various modalities, often by finetuning pre-trained base models. Such models need multiple GPUs due to both their size and computational load, driving the development of a bevy of "model parallelism" techniques & tools. Navigating such parallelism choices, however, is a new burden for end users of DL such as data scientists, domain scientists, etc. who may lack the necessary systems knowhow. The need for model selection, which leads to many models to train due to hyper-parameter tuning or layer-wise finetuning, compounds the situation with two more burdens: resource apportioning and scheduling. In this work, we tackle these three burdens for DL users in a unified manner by formalizing them as a joint problem that we call SPASE: Select a Parallelism, Allocate resources, and SchedulE. We propose a new information system architecture to tackle the SPASE problem holistically, representing a key step toward enabling wider adoption of large DL models. We devise an extensible template for existing parallelism schemes and combine it with an automated empirical profiler for runtime estimation. We then formulate SPASE as an MILP. We find that direct use of an MILP-solver is significantly more effective than several baseline heuristics. We optimize the system runtime further with an introspective scheduling approach. We implement all these techniques into a new data system we call Saturn. Experiments with benchmark DL workloads show that Saturn achieves 39-49% lower model selection runtimes than typical current DL practice.
翻译:GPT-3和ChatGPT等大型语言模型已彻底改变了深度学习(DL),催生了众多激发公众想象力的应用。这些模型正被迅速应用于跨领域分析(涵盖多种模态),通常通过对预训练基础模型进行微调来实现。由于规模和计算负载庞大,此类模型需要多个GPU,进而推动了大量"模型并行"技术及工具的研发。然而,对于数据科学家、领域科学家等深度学习终端用户而言,在并行化方案中做出选择成为新的负担——他们可能缺乏必要的系统知识。模型选择需求(超参数调优或逐层微调导致需训练大量模型)进一步带来资源分配与调度两项挑战。本研究通过将这三项负担统一形式化为SPASE联合问题(即选择并行化方案、分配资源与调度),提出一种新型信息系统架构以整体解决该问题,为促进大型深度学习模型的广泛采用迈出关键一步。我们为现有并行化方案设计了可扩展模板,并结合自动化经验性能分析器进行运行时估计。随后将SPASE形式化为混合整数线性规划(MILP)问题。实验表明,直接使用MILP求解器比多种基准启发式算法更高效。我们进一步通过内省式调度方法优化系统运行时。所有技术均集成于名为Saturn的新型数据系统中。基准深度学习工作负载实验显示,Saturn的模型选择运行时间较当前典型深度学习实践降低39%-49%。