Saturn: An Optimized Data System for Large Model Deep Learning Workloads

Large language models such as GPT-3 & ChatGPT have transformed deep learning (DL), powering applications that have captured the public's imagination. These models are rapidly being adopted across domains for analytics on various modalities, often by finetuning pre-trained base models. Such models need multiple GPUs due to both their size and computational load, driving the development of a bevy of "model parallelism" techniques & tools. Navigating such parallelism choices, however, is a new burden for end users of DL such as data scientists, domain scientists, etc. who may lack the necessary systems knowhow. The need for model selection, which leads to many models to train due to hyper-parameter tuning or layer-wise finetuning, compounds the situation with two more burdens: resource apportioning and scheduling. In this work, we tackle these three burdens for DL users in a unified manner by formalizing them as a joint problem that we call SPASE: Select a Parallelism, Allocate resources, and SchedulE. We propose a new information system architecture to tackle the SPASE problem holistically, representing a key step toward enabling wider adoption of large DL models. We devise an extensible template for existing parallelism schemes and combine it with an automated empirical profiler for runtime estimation. We then formulate SPASE as an MILP. We find that direct use of an MILP-solver is significantly more effective than several baseline heuristics. We optimize the system runtime further with an introspective scheduling approach. We implement all these techniques into a new data system we call Saturn. Experiments with benchmark DL workloads show that Saturn achieves 39-49% lower model selection runtimes than typical current DL practice.

翻译：GPT-3和ChatGPT等大型语言模型已彻底改变了深度学习（DL），催生了众多激发公众想象力的应用。这些模型正被迅速应用于跨领域分析（涵盖多种模态），通常通过对预训练基础模型进行微调来实现。由于规模和计算负载庞大，此类模型需要多个GPU，进而推动了大量"模型并行"技术及工具的研发。然而，对于数据科学家、领域科学家等深度学习终端用户而言，在并行化方案中做出选择成为新的负担——他们可能缺乏必要的系统知识。模型选择需求（超参数调优或逐层微调导致需训练大量模型）进一步带来资源分配与调度两项挑战。本研究通过将这三项负担统一形式化为SPASE联合问题（即选择并行化方案、分配资源与调度），提出一种新型信息系统架构以整体解决该问题，为促进大型深度学习模型的广泛采用迈出关键一步。我们为现有并行化方案设计了可扩展模板，并结合自动化经验性能分析器进行运行时估计。随后将SPASE形式化为混合整数线性规划（MILP）问题。实验表明，直接使用MILP求解器比多种基准启发式算法更高效。我们进一步通过内省式调度方法优化系统运行时。所有技术均集成于名为Saturn的新型数据系统中。基准深度学习工作负载实验显示，Saturn的模型选择运行时间较当前典型深度学习实践降低39%-49%。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日