We study the convergence of stochastic gradient descent (SGD) for non-convex objective functions. We establish the local convergence with positive probability under the local \L{}ojasiewicz condition introduced by Chatterjee in \cite{chatterjee2022convergence} and an additional local structural assumption of the loss function landscape. A key component of our proof is to ensure that the whole trajectories of SGD stay inside the local region with a positive probability. We also provide examples of neural networks with finite widths such that our assumptions hold.
翻译:我们研究了非凸目标函数下随机梯度下降(SGD)的收敛性。在Chatterjee于文献\cite{chatterjee2022convergence}中引入的局部Lojasiewicz条件,以及损失函数景观的额外局部结构性假设下,我们证明了其以正概率达到局部收敛。我们证明中的一个关键组成部分是确保SGD的整体轨迹以正概率保持在局部区域内。我们还提供了有限宽度神经网络的实例,使得我们的假设成立。