The proliferation of extensive neural network architectures, particularly deep learning models, presents a challenge in terms of resource-intensive training. GPU memory constraints have become a notable bottleneck in training such sizable models. Existing strategies, including data parallelism, model parallelism, pipeline parallelism, and fully sharded data parallelism, offer partial solutions. Model parallelism, in particular, enables the distribution of the entire model across multiple GPUs, yet the ensuing data communication between these partitions slows down training. Additionally, the substantial memory overhead required to store auxiliary parameters on each GPU compounds computational demands. Instead of using the entire model for training, this study advocates partitioning the model across GPUs and generating synthetic intermediate labels to train individual segments. These labels, produced through a random process, mitigate memory overhead and computational load. This approach results in a more efficient training process that minimizes data communication while maintaining model accuracy. To validate this method, a 6-layer fully connected neural network is partitioned into two parts and its performance is assessed on the extended MNIST dataset. Experimental results indicate that the proposed approach achieves similar testing accuracies to conventional training methods, while significantly reducing memory and computational requirements. This work contributes to mitigating the resource-intensive nature of training large neural networks, paving the way for more efficient deep learning model development.
翻译:大规模神经网络架构(特别是深度学习模型)的普及带来了资源密集型训练挑战。GPU内存限制已成为训练此类大型模型的显著瓶颈。现有策略(包括数据并行、模型并行、流水线并行及全分片数据并行)提供了部分解决方案。其中模型并行技术虽能将完整模型分布到多个GPU上,但分区间的数据通信会拖慢训练速度。此外,在每个GPU上存储辅助参数所需的大量内存开销进一步加重了计算负担。本研究倡导不采用完整模型进行训练,而是将模型分区至不同GPU,并通过生成合成中间标签来训练各个分区。这些通过随机过程生成的标签可降低内存开销与计算负载。该方法在保持模型精度的同时,通过最小化数据通信实现了更高效的训练流程。为验证该方法,我们将6层全连接神经网络分为两部分,并在扩展MNIST数据集上评估其性能。实验结果表明,该方法在显著降低内存与计算需求的同时,达到了与传统训练方法相似的测试准确率。本工作有助于缓解大型神经网络训练的资源密集特性,为更高效的深度学习模型开发铺平道路。