We investigate the complexity of training a two-layer ReLU neural network with weight decay regularization. Previous research has shown that the optimal solution of this problem can be found by solving a standard cone-constrained convex program. Using this convex formulation, we prove that the hardness of approximation of ReLU networks not only mirrors the complexity of the Max-Cut problem but also, in certain special cases, exactly corresponds to it. In particular, when $\epsilon\leq\sqrt{84/83}-1\approx 0.006$, we show that it is NP-hard to find an approximate global optimizer of the ReLU network objective with relative error $\epsilon$ with respect to the objective value. Moreover, we develop a randomized algorithm which mirrors the Goemans-Williamson rounding of semidefinite Max-Cut relaxations. To provide polynomial-time approximations, we classify training datasets into three categories: (i) For orthogonal separable datasets, a precise solution can be obtained in polynomial-time. (ii) When there is a negative correlation between samples of different classes, we give a polynomial-time approximation with relative error $\sqrt{\pi/2}-1\approx 0.253$. (iii) For general datasets, the degree to which the problem can be approximated in polynomial-time is governed by a geometric factor that controls the diameter of two zonotopes intrinsic to the dataset. To our knowledge, these results present the first polynomial-time approximation guarantees along with first hardness of approximation results for regularized ReLU networks.
翻译:我们研究具有权重衰减正则化的双层ReLU神经网络的训练复杂度。已有研究表明,该问题的最优解可通过求解标准锥约束凸规划获得。利用这一凸形式,我们证明ReLU网络近似困难性不仅与Max-Cut问题的复杂度相似,在某些特殊情况下甚至完全对应。具体而言,当$\epsilon\leq\sqrt{84/83}-1\approx 0.006$时,我们证明寻找ReLU网络目标函数相对误差为$\epsilon$的近似全局最优解是NP难的。此外,我们开发了一个随机算法,该算法与Goemans-Williamson半定Max-Cut松弛舍入方法相对应。为获得多项式时间近似解,我们将训练数据集分为三类:(i) 对于正交可分数据集,可在多项式时间内获得精确解;(ii) 当不同类别样本存在负相关时,我们给出相对误差为$\sqrt{\pi/2}-1\approx 0.253$的多项式时间近似解;(iii) 对于一般数据集,问题在多项式时间内可逼近的程度由控制数据集固有两个超平行体直径的几何因子决定。据我们所知,这些结果首次为正则化ReLU网络提供了多项式时间近似保证,同时给出了首个近似困难性结果。