Overparameterized models trained with (stochastic) gradient descent are ubiquitous in modern machine learning. These large models achieve unprecedented performance on test data, but their theoretical understanding is still limited. In this paper, we take a step towards filling this gap by adopting an optimization perspective. More precisely, we study the implicit regularization properties of the gradient flow "algorithm" for estimating the parameters of a deep diagonal neural network. Our main contribution is showing that this gradient flow induces a mirror flow dynamic on the model, meaning that it is biased towards a specific solution of the problem depending on the initialization of the network. Along the way, we prove several properties of the trajectory.
翻译:在当代机器学习中,使用(随机)梯度下降训练的超参数化模型已无处不在。这些大型模型在测试数据上取得了前所未有的性能,但其理论理解仍然有限。本文从优化视角出发,旨在填补这一空白。具体而言,我们研究了用于估计深度对角神经网络参数的梯度流"算法"的隐式正则化特性。我们的主要贡献在于证明该梯度流在模型上诱导出镜像流动态,这意味着其偏向于问题的特定解,该解取决于网络的初始化方式。在此过程中,我们证明了轨迹的若干性质。