Despite recent theoretical progress on the non-convex optimization of two-layer neural networks, it is still an open question whether gradient descent on neural networks without unnatural modifications can achieve better sample complexity than kernel methods. This paper provides a clean mean-field analysis of projected gradient flow on polynomial-width two-layer neural networks. Different from prior works, our analysis does not require unnatural modifications of the optimization algorithm. We prove that with sample size $n = O(d^{3.1})$ where $d$ is the dimension of the inputs, the network converges in polynomially many iterations to a non-trivial error that is not achievable by kernel methods using $n \ll d^4$ samples, hence demonstrating a clear separation between unmodified gradient descent and NTK.
翻译:尽管近年来在两层神经网络非凸优化方面取得了理论进展,但未经过人为修改的神经网络梯度下降能否达到比核方法更优的样本复杂度,仍是一个未决问题。本文对多项式宽度两层神经网络上的投影梯度流进行了清晰的平均场分析。与先前工作不同,我们的分析无需对优化算法进行人为修改。我们证明:当样本量$n = O(d^{3.1})$(其中$d$为输入维度)时,网络在多项式次迭代内收敛至非平凡误差,该误差无法通过使用$n \ll d^4$个样本的核方法实现,从而清晰展现了未修改梯度下降与NTK之间的分离性。