Saturn: An Optimized Data System for Large Model Deep Learning Workloads

Large language models such as GPT-3 & ChatGPT have transformed deep learning (DL), powering applications that have captured the public's imagination. These models are rapidly being adopted across domains for analytics on various modalities, often by finetuning pre-trained base models. Such models need multiple GPUs due to both their size and computational load, driving the development of a bevy of "model parallelism" techniques & tools. Navigating such parallelism choices, however, is a new burden for end users of DL such as data scientists, domain scientists, etc. who may lack the necessary systems knowhow. The need for model selection, which leads to many models to train due to hyper-parameter tuning or layer-wise finetuning, compounds the situation with two more burdens: resource apportioning and scheduling. In this work, we tackle these three burdens for DL users in a unified manner by formalizing them as a joint problem that we call SPASE: Select a Parallelism, Allocate resources, and SchedulE. We propose a new information system architecture to tackle the SPASE problem holistically, representing a key step toward enabling wider adoption of large DL models. We devise an extensible template for existing parallelism schemes and combine it with an automated empirical profiler for runtime estimation. We then formulate SPASE as an MILP. We find that direct use of an MILP-solver is significantly more effective than several baseline heuristics. We optimize the system runtime further with an introspective scheduling approach. We implement all these techniques into a new data system we call Saturn. Experiments with benchmark DL workloads show that Saturn achieves 39-49% lower model selection runtimes than typical current DL practice.

翻译：诸如GPT-3和ChatGPT等大型语言模型已深刻改变了深度学习领域，驱动着诸多引人瞩目的应用。这些模型正通过微调预训练基础模型被快速应用于各领域的不同模态分析。由于模型规模与计算负载需求，此类模型需要多GPU支持，从而催生了多种"模型并行"技术与工具。然而，对这些并行方案进行选择，给数据科学家、领域科学家等深度学习终端用户带来了新的负担——他们可能缺乏必要的系统知识。模型选择需求（需通过超参调优或逐层微调训练大量模型）进一步叠加了资源分配与调度的双重挑战。本研究通过将上述三类挑战统一建模为SPASE问题（即选择并行策略、分配资源与调度），系统性地为DL用户提供了解决方案。我们提出了一种新型信息系统架构来全局应对SPASE问题，这标志着推动大型DL模型广泛采纳的关键一步。我们为现有并行方案设计了可扩展模板，并结合自动经验性能分析器进行运行时预估，进而将SPASE问题转化为混合整数线性规划（MILP）。实验表明，直接使用MILP求解器显著优于多种基线启发式算法。我们进一步采用内省调度方法优化系统运行时性能。所有技术均集成于名为Saturn的新型数据系统中。在基准DL工作负载上的测试显示，相比当前典型DL实践，Saturn可将模型选择运行时间降低39%-49%。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日