This study explores the sample complexity for two-layer neural networks to learn a single-index target function under Stochastic Gradient Descent (SGD), focusing on the challenging regime where many flat directions are present at initialization. It is well-established that in this scenario $n=O(d\log{d})$ samples are typically needed. However, we provide precise results concerning the pre-factors in high-dimensional contexts and for varying widths. Notably, our findings suggest that overparameterization can only enhance convergence by a constant factor within this problem class. These insights are grounded in the reduction of SGD dynamics to a stochastic process in lower dimensions, where escaping mediocrity equates to calculating an exit time. Yet, we demonstrate that a deterministic approximation of this process adequately represents the escape time, implying that the role of stochasticity may be minimal in this scenario.
翻译:本研究探讨了在随机梯度下降(SGD)下,双层神经网络学习单索引目标函数的样本复杂度,重点关注初始化时存在大量平坦方向的困难情形。已有共识表明,此场景下通常需要$n=O(d\log{d})$个样本。然而,我们针对高维环境及不同宽度下的前置因子给出了精确结果。值得注意的是,研究结果表明,在该问题类别中,过参数化仅能将收敛速度提升常数倍。这些见解基于将SGD动力学约化为低维随机过程——在此过程中,逃离平庸等价于计算退出时间。但进一步证明,该过程的确定性近似足以表征退出时间,意味着随机性在此场景中的作用可能微乎其微。