In this note, we consider appropriately regularized $\ell_2-$empirical risk of depth $2$ nets with any number of gates and show bounds on how the empirical loss evolves for SGD iterates on it -- for arbitrary data and if the activation is adequately smooth and bounded like sigmoid and tanh. This in turn leads to a proof of global convergence of SGD for a special class of initializations. We also prove an exponentially fast convergence rate for continuous time SGD that also applies to smooth unbounded activations like SoftPlus. Our key idea is to show the existence of Frobenius norm regularized loss functions on constant-sized neural nets which are "Villani functions" and thus be able to build on recent progress with analyzing SGD on such objectives. Most critically the amount of regularization required for our analysis is independent of the size of the net.
翻译:本文研究了具有任意数量门控单元的双层神经网络,在适当正则化下的$\ell_2$经验风险。我们证明了对于任意数据分布,当激活函数具备充分的平滑性和有界性(如sigmoid和tanh函数)时,随机梯度下降迭代过程中经验损失的演化界。这一结论进而为特定初始化类别下的随机梯度下降提供了全局收敛性证明。此外,我们针对连续时间随机梯度下降证明了指数级收敛速率,该结果同样适用于SoftPlus等平滑无界激活函数。本研究的核心思路在于:我们证明了在恒定规模神经网络上存在Frobenius范数正则化的损失函数,这些函数属于"Villani函数"类别,从而能够基于近期在此类目标函数上分析随机梯度下降的研究进展展开工作。最关键的是,我们分析所需的正则化强度与网络规模无关。