Many existing Neural Network pruning approaches rely on either retraining or inducing a strong bias in order to converge to a sparse solution throughout training. A third paradigm, 'compression-aware' training, aims to obtain state-of-the-art dense models that are robust to a wide range of compression ratios using a single dense training run while also avoiding retraining. We propose a framework centered around a versatile family of norm constraints and the Stochastic Frank-Wolfe (SFW) algorithm that encourage convergence to well-performing solutions while inducing robustness towards convolutional filter pruning and low-rank matrix decomposition. Our method is able to outperform existing compression-aware approaches and, in the case of low-rank matrix decomposition, it also requires significantly less computational resources than approaches based on nuclear-norm regularization. Our findings indicate that dynamically adjusting the learning rate of SFW, as suggested by Pokutta et al. (2020), is crucial for convergence and robustness of SFW-trained models and we establish a theoretical foundation for that practice.
翻译:许多现有的神经网络剪枝方法依赖于重新训练或引入强偏置以确保训练过程中收敛到稀疏解。第三种范式——"压缩感知"训练——旨在通过单次稠密训练获得对广泛压缩比具有鲁棒性的最先进稠密模型,同时避免重新训练。我们提出一个以通用范数约束族和随机Frank-Wolfe(SFW)算法为核心的框架,该框架鼓励收敛到性能优异的解,同时增强对卷积滤波器剪枝和低秩矩阵分解的鲁棒性。我们的方法优于现有压缩感知方法,并且在低秩矩阵分解场景中,其计算资源需求显著低于基于核范数正则化的方法。研究结果表明,如Pokutta等人(2020)所建议的,动态调整SFW的学习率对于SFW训练模型的收敛性和鲁棒性至关重要,我们为该实践建立了理论基础。