In this note, we demonstrate a first-of-its-kind provable convergence of SGD to the global minima of appropriately regularized logistic empirical risk of depth $2$ nets -- for arbitrary data and with any number of gates with adequately smooth and bounded activations like sigmoid and tanh. We also prove an exponentially fast convergence rate for continuous time SGD that also applies to smooth unbounded activations like SoftPlus. Our key idea is to show the existence of Frobenius norm regularized logistic loss functions on constant-sized neural nets which are "Villani functions" and thus be able to build on recent progress with analyzing SGD on such objectives.
翻译:本文首次证明:对于任意数据及任意数量门控单元(使用充分光滑且有界激活函数如sigmoid和tanh),随机梯度下降法(SGD)能够收敛至深度为2的神经网络经适当正则化后的逻辑经验风险全局最小值。我们还证明了连续时间SGD的指数级快速收敛率,该结果同样适用于光滑无界激活函数(如SoftPlus)。核心思想在于证明常数规模神经网络上存在Frobenius范数正则化逻辑损失函数属于"Villani函数"类,从而能够沿用近期在分析此类目标函数的SGD算法方面取得的进展。