On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

The field of deep learning has witnessed significant progress, particularly in computer vision (CV), natural language processing (NLP), and speech. The use of large-scale models trained on vast amounts of data holds immense promise for practical applications, enhancing industrial productivity and facilitating social development. With the increasing demands on computational capacity, though numerous studies have explored the efficient training, a comprehensive summarization on acceleration techniques of training deep learning models is still much anticipated. In this survey, we present a detailed review for training acceleration. We consider the fundamental update formulation and split its basic components into five main perspectives: (1) data-centric: including dataset regularization, data sampling, and data-centric curriculum learning techniques, which can significantly reduce the computational complexity of the data samples; (2) model-centric, including acceleration of basic modules, compression training, model initialization and model-centric curriculum learning techniques, which focus on accelerating the training via reducing the calculations on parameters; (3) optimization-centric, including the selection of learning rate, the employment of large batchsize, the designs of efficient objectives, and model average techniques, which pay attention to the training policy and improving the generality for the large-scale models; (4) budgeted training, including some distinctive acceleration methods on source-constrained situations; (5) system-centric, including some efficient open-source distributed libraries/systems which provide adequate hardware support for the implementation of acceleration algorithms. By presenting this comprehensive taxonomy, our survey presents a comprehensive review to understand the general mechanisms within each component and their joint interaction.

翻译：深度学习领域取得了显著进展，尤其在计算机视觉（CV）、自然语言处理（NLP）和语音领域。使用基于海量数据训练的大规模模型在工业生产力提升和社会发展促进方面展现出巨大应用潜力。随着计算能力需求的不断增长，尽管已有大量研究探索高效训练方法，但针对深度学习模型训练加速技术的全面综述仍备受期待。本综述对训练加速技术进行了详细回顾。我们从基础更新公式出发，将其基本组件划分为五个主要视角：（1）数据为中心：包括数据集正则化、数据采样和以数据为中心的课程学习技术，可显著降低数据样本的计算复杂度；（2）模型为中心：包括基础模块加速、压缩训练、模型初始化和以模型为中心的课程学习技术，侧重于通过减少参数计算量来加速训练；（3）优化为中心：包括学习率选择、大批量使用、高效目标设计及模型平均技术，关注训练策略并提升大规模模型的泛化能力；（4）预算受限训练：包括针对资源受限场景的若干特色加速方法；（5）系统为中心：包括提供充分硬件支持以加速算法实现的高效开源分布式库/系统。通过提出这一系统分类法，本综述全面梳理了各组件内部的一般机制及其交互作用。