In supervised learning, the regularization path is sometimes used as a convenient theoretical proxy for the optimization path of gradient descent initialized from zero. In this paper, we study a modification of the regularization path for infinite-width 2-layer ReLU neural networks with nonzero initial distribution of the weights at different scales. By exploiting a link with unbalanced optimal-transport theory, we show that, despite the non-convexity of the 2-layer network training, this problem admits an infinite-dimensional convex counterpart. We formulate the corresponding functional-optimization problem and investigate its main properties. In particular, we show that, as the scale of the initialization ranges between $0$ and $+\infty$, the associated path interpolates continuously between the so-called kernel and rich regimes. Numerical experiments confirm that, in our setting, the scaling path and the final states of the optimization path behave similarly, even beyond these extreme points.
翻译:在有监督学习中,正则化路径有时被用作从零初始化的梯度下降优化路径的便捷理论代理。本文研究了具有不同尺度非零初始权重分布的无限宽两层ReLU神经网络的修正正则化路径。通过利用与非平衡最优传输理论的联系,我们证明,尽管两层网络训练具有非凸性,该问题仍存在一个无限维凸对偶形式。我们构建了相应的函数优化问题并探讨其主要性质。特别地,我们表明:当初始化尺度在$0$到$+\infty$之间变化时,关联路径在所谓的核态与丰富态之间连续插值。数值实验证实,在我们的设定下,缩放路径与优化路径的终态行为相似,即使超出这些极值点也是如此。