Partitioned Neural Network Training via Synthetic Intermediate Labels

The proliferation of extensive neural network architectures, particularly deep learning models, presents a challenge in terms of resource-intensive training. GPU memory constraints have become a notable bottleneck in training such sizable models. Existing strategies, including data parallelism, model parallelism, pipeline parallelism, and fully sharded data parallelism, offer partial solutions. Model parallelism, in particular, enables the distribution of the entire model across multiple GPUs, yet the ensuing data communication between these partitions slows down training. Additionally, the substantial memory overhead required to store auxiliary parameters on each GPU compounds computational demands. Instead of using the entire model for training, this study advocates partitioning the model across GPUs and generating synthetic intermediate labels to train individual segments. These labels, produced through a random process, mitigate memory overhead and computational load. This approach results in a more efficient training process that minimizes data communication while maintaining model accuracy. To validate this method, a 6-layer fully connected neural network is partitioned into two parts and its performance is assessed on the extended MNIST dataset. Experimental results indicate that the proposed approach achieves similar testing accuracies to conventional training methods, while significantly reducing memory and computational requirements. This work contributes to mitigating the resource-intensive nature of training large neural networks, paving the way for more efficient deep learning model development.

翻译：大规模神经网络架构（特别是深度学习模型）的普及带来了资源密集型训练挑战。GPU内存限制已成为训练此类大型模型的显著瓶颈。现有策略（包括数据并行、模型并行、流水线并行及全分片数据并行）提供了部分解决方案。其中模型并行技术虽能将完整模型分布到多个GPU上，但分区间的数据通信会拖慢训练速度。此外，在每个GPU上存储辅助参数所需的大量内存开销进一步加重了计算负担。本研究倡导不采用完整模型进行训练，而是将模型分区至不同GPU，并通过生成合成中间标签来训练各个分区。这些通过随机过程生成的标签可降低内存开销与计算负载。该方法在保持模型精度的同时，通过最小化数据通信实现了更高效的训练流程。为验证该方法，我们将6层全连接神经网络分为两部分，并在扩展MNIST数据集上评估其性能。实验结果表明，该方法在显著降低内存与计算需求的同时，达到了与传统训练方法相似的测试准确率。本工作有助于缓解大型神经网络训练的资源密集特性，为更高效的深度学习模型开发铺平道路。

相关内容

MoDELS

关注 0

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日