In this note, we demonstrate a first-of-its-kind provable convergence of SGD to the global minima of appropriately regularized logistic empirical risk of depth $2$ nets -- for arbitrary data and with any number of gates with adequately smooth and bounded activations like sigmoid and tanh. We also prove an exponentially fast convergence rate for continuous time SGD that also applies to smooth unbounded activations like SoftPlus. Our key idea is to show the existence of Frobenius norm regularized logistic loss functions on constant-sized neural nets which are "Villani functions" and thus be able to build on recent progress with analyzing SGD on such objectives.
翻译:本文首次证明了在适当正则化条件下,随机梯度下降法可收敛至深度为2的网络逻辑经验风险函数的全局最小值——该结论适用于任意数据及任意数量的神经元,且激活函数需满足充分光滑有界性(如sigmoid和tanh)。我们还证明了连续时间SGD具有指数级收敛速度,该结果同样适用于光滑无界激活函数(如SoftPlus)。本文核心思路是:证明基于Frobenius范数正则化逻辑损失函数(定义在常数规模神经网络上)属于"Villani函数"类,从而能够利用近期关于此类目标函数SGD分析的研究进展。