We investigate the generalization and optimization of $k$-homogeneous shallow neural-network classifiers in the interpolating regime. The study focuses on analyzing the performance of the model when it is capable of perfectly classifying the input data with a positive margin $\gamma$. When using gradient descent with logistic-loss minimization, we show that the training loss converges to zero at a rate of $\tilde O(1/\gamma^{2/k} T)$ given a polylogarithmic number of neurons. This suggests that gradient descent can find a perfect classifier for $n$ input data within $\tilde{\Omega}(n)$ iterations. Additionally, through a stability analysis we show that with $m=\Omega(\log^{4/k} (n))$ neurons and $T=\Omega(n)$ iterations, the test loss is bounded by $\tilde{O}(1/\gamma^{2/k} n)$. This is in contrast to existing stability results which require polynomial width and yield suboptimal generalization rates. Central to our analysis is the use of a new self-bounded weak convexity property, which leads to a generalized local quasi-convexity property for sufficiently parameterized neural-network classifiers. Eventually, despite the objective's non-convexity, this leads to convergence and generalization-gap bounds that are similar to those in the convex setting of linear logistic regression.
翻译:我们研究在插值机制下$k$齐次浅层神经网络分类器的泛化与优化问题。重点分析当模型能以正间隔$\gamma$完美分类输入数据时的性能表现。采用逻辑损失最小化的梯度下降法时,我们证明在多项式对数个神经元条件下,训练损失以$\tilde O(1/\gamma^{2/k} T)$速率收敛至零。这表明梯度下降能在$\tilde{\Omega}(n)$次迭代内对$n$个输入数据找到完美分类器。此外,通过稳定性分析,我们证明当神经元数量$m=\Omega(\log^{4/k} (n))$且迭代次数$T=\Omega(n)$时,测试损失被约束为$\tilde{O}(1/\gamma^{2/k} n)$。这与现有需多项式宽度且产生次优泛化率的稳定性结果形成对比。分析核心在于采用新的自约束弱凸性性质,该性质为充分参数化的神经网络分类器导出广义局部拟凸性。最终,尽管目标函数非凸,我们仍获得与线性逻辑回归凸设定相似的收敛性与泛化间隙上界。