Over-parameterization of deep neural networks (DNNs) has shown high prediction accuracy for many applications. Although effective, the large number of parameters hinders its popularity on resource-limited devices and has an outsize environmental impact. Sparse training (using a fixed number of nonzero weights in each iteration) could significantly mitigate the training costs by reducing the model size. However, existing sparse training methods mainly use either random-based or greedy-based drop-and-grow strategies, resulting in local minimal and low accuracy. In this work, we consider the dynamic sparse training as a sparse connectivity search problem and design an exploitation and exploration acquisition function to escape from local optima and saddle points. We further design an acquisition function and provide the theoretical guarantees for the proposed method and clarify its convergence property. Experimental results show that sparse models (up to 98\% sparsity) obtained by our proposed method outperform the SOTA sparse training methods on a wide variety of deep learning tasks. On VGG-19 / CIFAR-100, ResNet-50 / CIFAR-10, ResNet-50 / CIFAR-100, our method has even higher accuracy than dense models. On ResNet-50 / ImageNet, the proposed method has up to 8.2\% accuracy improvement compared to SOTA sparse training methods.
翻译:深度神经网络(DNNs)的过参数化虽在诸多应用中展现出高预测精度,但其庞大的参数量限制了在资源受限设备上的普及,并带来显著的环境负担。稀疏训练(每次迭代中使用固定数量的非零权重)通过缩减模型尺寸可显著降低训练成本。然而,现有稀疏训练方法主要采用基于随机或贪心的丢弃-增长策略,易陷入局部最优与低精度困境。本文将动态稀疏训练视为稀疏连通性搜索问题,设计了一种兼顾开发与探索的采集函数以逃离局部最优与鞍点。进一步地,我们构建了该采集函数的理论框架,阐明了其收敛特性。实验结果表明,本文方法获得的稀疏模型(稀疏率高达98%)在多种深度学习任务上均优于当前最优稀疏训练方法。在VGG-19/CIFAR-100、ResNet-50/CIFAR-10、ResNet-50/CIFAR-100任务中,本方法甚至取得了高于稠密模型的精度。针对ResNet-50/ImageNet任务,相较于当前最优稀疏训练方法,本方法的准确率提升最高达8.2%。