We study the topology of the loss landscape of one-hidden-layer ReLU networks under overparameterization. On the theory side, we (i) prove that for convex $L$-Lipschitz losses with an $\ell_1$-regularized second layer, every pair of models at the same loss level can be connected by a continuous path within an arbitrarily small loss increase $ε$ (extending a known result for the quadratic loss); (ii) obtain an asymptotic upper bound on the energy gap $ε$ between local and global minima that vanishes as the width $m$ grows, implying that the landscape flattens and sublevel sets become connected in the limit. Empirically, on a synthetic Moons dataset and on the Wisconsin Breast Cancer dataset, we measure pairwise energy gaps via Dynamic String Sampling (DSS) and find that wider networks exhibit smaller gaps; in particular, a permutation test on the maximum gap yields $p_{perm}=0$, indicating a clear reduction in the barrier height.
翻译:本研究探讨单隐层ReLU网络在过参数化条件下损失景观的拓扑性质。理论方面,我们(i)证明了对于具有$\ell_1$正则化第二层的凸$L$-Lipschitz损失函数,任意处于相同损失水平的模型对都可通过连续路径连接,且路径上的损失增量可控制在任意小值$ε$内(该结论推广了二次损失函数的已知结果);(ii)获得了局部极小值与全局极小值之间能量间隙$ε$的渐近上界,该上界随网络宽度$m$增大而趋于零,表明损失景观在极限情况下趋于平坦且子水平集连通。实证方面,我们在合成Moons数据集和威斯康星乳腺癌数据集上通过动态弦采样(DSS)测量成对能量间隙,发现更宽的网络呈现更小的间隙;特别地,对最大间隙的置换检验得到$p_{perm}=0$,表明屏障高度显著降低。