Algorithmic stability is a classical framework for analyzing the generalization error of learning algorithms. It predicts that an algorithm has small generalization error if it is insensitive to small perturbations in the training set such as the removal or replacement of a training point. While stability has been demonstrated for numerous well-known algorithms, this framework has had limited success in analyses of deep neural networks. In this paper we study the algorithmic stability of deep ReLU homogeneous neural networks that achieve zero training error using parameters with the smallest $L_2$ norm, also known as the minimum-norm interpolation, a phenomenon that can be observed in overparameterized models trained by gradient-based algorithms. We investigate sufficient conditions for such networks to be stable. We find that 1) such networks are stable when they contain a (possibly small) stable sub-network, followed by a layer with a low-rank weight matrix, and 2) such networks are not guaranteed to be stable even when they contain a stable sub-network, if the following layer is not low-rank. The low-rank assumption is inspired by recent empirical and theoretical results which demonstrate that training deep neural networks is biased towards low-rank weight matrices, for minimum-norm interpolation and weight-decay regularization.
翻译:算法稳定性是分析学习算法泛化误差的经典框架。该框架预测,若算法对训练集中的微小扰动(如移除或替换一个训练点)不敏感,则其具有较小的泛化误差。尽管稳定性已在众多经典算法中得到验证,但该框架在深度神经网络分析中的应用仍较为有限。本文研究实现零训练误差且具有最小$L_2$范数参数(即最小范数插值)的深度ReLU齐次神经网络的算法稳定性,这种现象常见于基于梯度算法训练的过参数化模型中。我们探讨了此类网络保持稳定性的充分条件,发现:1)当网络包含一个(可能较小的)稳定子网络,且其后接具有低秩权重矩阵的层时,网络具有稳定性;2)若后续层不具备低秩特性,即使网络包含稳定子网络,其稳定性仍无法得到保证。低秩假设的提出受到近期实证与理论研究的启发,这些研究表明在最小范数插值与权重衰减正则化场景下,深度神经网络的训练过程倾向于产生低秩权重矩阵。