In this paper, we study the optimality gap between two-layer ReLU networks regularized with weight decay and their convex relaxations. We show that when the training data is random, the relative optimality gap between the original problem and its relaxation can be bounded by a factor of O(log n^0.5), where n is the number of training samples. A simple application leads to a tractable polynomial-time algorithm that is guaranteed to solve the original non-convex problem up to a logarithmic factor. Moreover, under mild assumptions, we show that local gradient methods converge to a point with low training loss with high probability. Our result is an exponential improvement compared to existing results and sheds new light on understanding why local gradient methods work well.
翻译:本文研究了两层ReLU神经网络在权重衰减正则化下的最优性间隙及其凸松弛问题。我们证明,当训练数据随机生成时,原始问题与其松弛解之间的相对最优性间隙可被O(log n^0.5)因子界定,其中n为训练样本数量。通过简单应用该结论,我们得到一种可在多项式时间内求解的算法,该算法能保证以对数因子精度求解原始非凸问题。此外,在温和假设下,我们证明局部梯度方法能以高概率收敛到具有较低训练损失的解点。相较于现有结果,我们的结论实现了指数级改进,为理解局部梯度方法的高效性提供了新的理论视角。