We introduce a dynamic sparse training algorithm based on linearized Bregman iterations / mirror descent that exploits the naturally incurred sparsity by alternating between periods of static and dynamic sparsity pattern updates. The key idea is to combine sparsity-inducing Bregman iterations with adaptive freezing of the network structure to enable efficient exploration of the sparse parameter space while maintaining sparsity. We provide convergence guaranties by embedding our method in a multilevel optimization framework. Furthermore, we empirically show that our algorithm can produce highly sparse and accurate models on standard benchmarks. We also show that the theoretical number of FLOPs compared to SGD training can be reduced from 38% for standard Bregman iterations to 6% for our method while maintaining test accuracy.
翻译:本文提出一种基于线性化Bregman迭代/镜像下降的动态稀疏训练算法,该算法通过交替执行静态与动态稀疏模式更新阶段,有效利用自然产生的稀疏性。核心思想是将稀疏诱导的Bregman迭代与网络结构自适应冻结机制相结合,在保持稀疏性的同时实现对稀疏参数空间的高效探索。通过将本方法嵌入多级优化框架,我们提供了收敛性保证。实验结果表明,本算法在标准基准测试中能生成高稀疏度且高精度的模型。理论分析表明,相较于标准Bregman迭代将SGD训练的理论FLOPs降低38%,本方法可进一步将FLOPs降低至6%且保持测试精度。