Large language models such as GPT-3 & ChatGPT have transformed deep learning (DL), powering applications that have captured the public's imagination. These models are rapidly being adopted across domains for analytics on various modalities, often by finetuning pre-trained base models. Such models need multiple GPUs due to both their size and computational load, driving the development of a bevy of "model parallelism" techniques & tools. Navigating such parallelism choices, however, is a new burden for end users of DL such as data scientists, domain scientists, etc. who may lack the necessary systems knowhow. The need for model selection, which leads to many models to train due to hyper-parameter tuning or layer-wise finetuning, compounds the situation with two more burdens: resource apportioning and scheduling. In this work, we tackle these three burdens for DL users in a unified manner by formalizing them as a joint problem that we call SPASE: Select a Parallelism, Allocate resources, and SchedulE. We propose a new information system architecture to tackle the SPASE problem holistically, representing a key step toward enabling wider adoption of large DL models. We devise an extensible template for existing parallelism schemes and combine it with an automated empirical profiler for runtime estimation. We then formulate SPASE as an MILP. We find that direct use of an MILP-solver is significantly more effective than several baseline heuristics. We optimize the system runtime further with an introspective scheduling approach. We implement all these techniques into a new data system we call Saturn. Experiments with benchmark DL workloads show that Saturn achieves 39-49% lower model selection runtimes than typical current DL practice.
翻译:诸如GPT-3和ChatGPT等大型语言模型已深刻改变了深度学习领域,驱动着诸多引人瞩目的应用。这些模型正通过微调预训练基础模型被快速应用于各领域的不同模态分析。由于模型规模与计算负载需求,此类模型需要多GPU支持,从而催生了多种"模型并行"技术与工具。然而,对这些并行方案进行选择,给数据科学家、领域科学家等深度学习终端用户带来了新的负担——他们可能缺乏必要的系统知识。模型选择需求(需通过超参调优或逐层微调训练大量模型)进一步叠加了资源分配与调度的双重挑战。本研究通过将上述三类挑战统一建模为SPASE问题(即选择并行策略、分配资源与调度),系统性地为DL用户提供了解决方案。我们提出了一种新型信息系统架构来全局应对SPASE问题,这标志着推动大型DL模型广泛采纳的关键一步。我们为现有并行方案设计了可扩展模板,并结合自动经验性能分析器进行运行时预估,进而将SPASE问题转化为混合整数线性规划(MILP)。实验表明,直接使用MILP求解器显著优于多种基线启发式算法。我们进一步采用内省调度方法优化系统运行时性能。所有技术均集成于名为Saturn的新型数据系统中。在基准DL工作负载上的测试显示,相比当前典型DL实践,Saturn可将模型选择运行时间降低39%-49%。