The phenomenon of implicit regularization has attracted interest in recent years as a fundamental aspect of the remarkable generalizing ability of neural networks. In a nutshell, it entails that gradient descent dynamics in many neural nets, even without any explicit regularizer in the loss function, converges to the solution of a regularized learning problem. However, known results attempting to theoretically explain this phenomenon focus overwhelmingly on the setting of linear neural nets, and the simplicity of the linear structure is particularly crucial to existing arguments. In this paper, we explore this problem in the context of more realistic neural networks with a general class of non-linear activation functions, and rigorously demonstrate the implicit regularization phenomenon for such networks in the setting of matrix sensing problems, together with rigorous rate guarantees that ensure exponentially fast convergence of gradient descent.In this vein, we contribute a network architecture called Spectral Neural Networks (abbrv. SNN) that is particularly suitable for matrix learning problems. Conceptually, this entails coordinatizing the space of matrices by their singular values and singular vectors, as opposed to by their entries, a potentially fruitful perspective for matrix learning. We demonstrate that the SNN architecture is inherently much more amenable to theoretical analysis than vanilla neural nets and confirm its effectiveness in the context of matrix sensing, via both mathematical guarantees and empirical investigations. We believe that the SNN architecture has the potential to be of wide applicability in a broad class of matrix learning scenarios.
翻译:隐式正则化现象近年来作为神经网络卓越泛化能力的一个基本方面引起了广泛关注。简言之,该现象指许多神经网络中的梯度下降动力学,即使损失函数中不包含任何显式正则化项,也会收敛到一个正则化学习问题的解。然而,现有试图从理论上解释这一现象的结果主要集中在线性神经网络设定上,而线性结构的简洁性对现有论证尤为关键。本文在更具现实性的神经网络背景下——即带有一般非线性激活函数类别的网络——探讨这一问题,并严格证明了此类网络在矩阵感知问题设定中的隐式正则化现象,同时给出了确保梯度下降指数级快速收敛的严格速率保证。为此,我们提出一种名为频谱神经网络(简称SNN)的网络架构,该架构特别适合矩阵学习问题。从概念上讲,这通过矩阵的奇异值和奇异向量(而非矩阵元素)来对矩阵空间进行坐标化,这可能为矩阵学习提供富有前景的视角。我们证明,与普通神经网络相比,SNN架构本质上更易于进行理论分析,并通过数学保证和实验研究确认了其在矩阵感知问题中的有效性。我们相信,SNN架构有望在广泛的矩阵学习场景中得到广泛应用。