This paper presents a comprehensive study on the convergence rates of the stochastic gradient descent (SGD) algorithm when applied to overparameterized two-layer neural networks. Our approach combines the Neural Tangent Kernel (NTK) approximation with convergence analysis in the Reproducing Kernel Hilbert Space (RKHS) generated by NTK, aiming to provide a deep understanding of the convergence behavior of SGD in overparameterized two-layer neural networks. Our research framework enables us to explore the intricate interplay between kernel methods and optimization processes, shedding light on the optimization dynamics and convergence properties of neural networks. In this study, we establish sharp convergence rates for the last iterate of the SGD algorithm in overparameterized two-layer neural networks. Additionally, we have made significant advancements in relaxing the constraints on the number of neurons, which have been reduced from exponential dependence to polynomial dependence on the sample size or number of iterations. This improvement allows for more flexibility in the design and scaling of neural networks, and will deepen our theoretical understanding of neural network models trained with SGD.
翻译:本文对随机梯度下降(SGD)算法应用于过参数化双层神经网络时的收敛速率进行了全面研究。我们的方法将神经正切核(NTK)近似与NTK生成的再生核希尔伯特空间(RKHS)中的收敛分析相结合,旨在深入理解SGD在过参数化双层神经网络中的收敛行为。我们的研究框架使我们能够探索核方法与优化过程之间复杂的相互作用,从而阐明神经网络的优化动力学与收敛特性。在本研究中,我们为过参数化双层神经网络中SGD算法最后迭代的收敛速率建立了精确的界。此外,我们在放宽神经元数量约束方面取得了重要进展,将其从对样本量或迭代次数的指数依赖降低至多项式依赖。这一改进为神经网络的设计与扩展提供了更大的灵活性,并将深化我们对使用SGD训练的神经网络模型的理论理解。