The rise of Deep Neural Networks (DNNs) has led to an increase in model size and complexity, straining the memory capacity of GPUs. Sparsity in DNNs, characterized as structural or ephemeral, has gained attention as a solution. This work focuses on ephemeral sparsity, aiming to reduce memory consumption during training. It emphasizes the significance of activations, an often overlooked component, and their role in memory usage. This work employs structured pruning in Block Sparse Compressed Row (BSR) format in combination with a magnitude-based criterion to efficiently prune activations. We furthermore introduce efficient block-sparse operators for GPUs and showcase their effectiveness, as well as the superior compression offered by block sparsity. We report the effectiveness of activation pruning by evaluating training speed, accuracy, and memory usage of large-scale neural architectures on the example of ResMLP on image classification tasks. As a result, we observe a memory reduction of up to 32% while maintaining accuracy. Ultimately, our approach aims to democratize large-scale model training, reduce GPU requirements, and address ecological concerns.
翻译:深度神经网络(DNN)的兴起导致模型规模和复杂度增加,给GPU内存容量带来压力。深度学习中的稀疏性(包括结构性与瞬时性稀疏性)作为解决方案引起关注。本文聚焦于瞬时稀疏性,旨在减少训练过程中的内存消耗,强调激活值——这一常被忽视的组件——及其对内存使用的重要性。本研究采用基于块稀疏压缩行(BSR)格式的结构化剪枝,结合基于幅度的评判标准高效剪枝激活值。我们进一步为GPU引入高效的块稀疏算子,展示其有效性及块稀疏性带来的更优压缩效果。以ResMLP在图像分类任务为例,通过评估大规模神经架构的训练速度、准确率和内存使用情况,验证激活剪枝的有效性。结果表明,在保持准确率的同时,内存占用最多可降低32%。最终,本研究旨在推动大规模模型训练的普及化,降低GPU需求并关注生态问题。