With the growth of model and data sizes, a broad effort has been made to design pruning techniques that reduce the resource demand of deep learning pipelines, while retaining model performance. In order to reduce both inference and training costs, a prominent line of work uses low-rank matrix factorizations to represent the network weights. Although able to retain accuracy, we observe that low-rank methods tend to compromise model robustness against adversarial perturbations. By modeling robustness in terms of the condition number of the neural network, we argue that this loss of robustness is due to the exploding singular values of the low-rank weight matrices. Thus, we introduce a robust low-rank training algorithm that maintains the network's weights on the low-rank matrix manifold while simultaneously enforcing approximate orthonormal constraints. The resulting model reduces both training and inference costs while ensuring well-conditioning and thus better adversarial robustness, without compromising model accuracy. This is shown by extensive numerical evidence and by our main approximation theorem that shows the computed robust low-rank network well-approximates the ideal full model, provided a highly performing low-rank sub-network exists.
翻译:随着模型和数据规模的增长,人们广泛致力于设计剪枝技术,以降低深度学习管线的资源需求,同时保持模型性能。为了降低推理和训练成本,一个重要的研究方向是使用低秩矩阵分解来表示网络权重。尽管能够保持精度,但我们观察到低秩方法往往会削弱模型对对抗扰动的鲁棒性。通过将鲁棒性建模为神经网络条件数,我们认为这种鲁棒性损失源于低秩权重矩阵奇异值的爆炸式增长。因此,我们提出了一种鲁棒低秩训练算法,该算法在保持网络权重位于低秩矩阵流形上的同时,强制执行近似正交约束。得到的模型既降低了训练和推理成本,又确保了良好的条件数,从而在不牺牲模型精度的情况下提升了对抗鲁棒性。大量数值实验和我们的主要逼近定理证明了这一点,该定理表明,只要存在一个高性能的低秩子网络,所计算的鲁棒低秩网络就能很好地逼近理想的全模型。