In supervised learning, the regularization path is sometimes used as a convenient theoretical proxy for the optimization path of gradient descent initialized with zero. In this paper, we study a modification of the regularization path for infinite-width 2-layer ReLU neural networks with non-zero initial distribution of the weights at different scales. By exploiting a link with unbalanced optimal transport theory, we show that, despite the non-convexity of the 2-layer network training, this problem admits an infinite dimensional convex counterpart. We formulate the corresponding functional optimization problem and investigate its main properties. In particular, we show that as the scale of the initialization ranges between $0$ and $+\infty$, the associated path interpolates continuously between the so-called kernel and rich regimes. The numerical experiments confirm that, in our setting, the scaling path and the final states of the optimization path behave similarly even beyond these extreme points.
翻译:在监督学习中,正则化路径有时被用作从零初始化的梯度下降优化路径的便捷理论代理。本文研究了在不同尺度下具有非零初始权重分布的无限宽两层ReLU神经网络的正则化路径的变体。通过与不平衡最优输运理论的联系,我们证明,尽管两层网络训练具有非凸性,该问题仍存在一个无限维的凸对应形式。我们提出了相应的泛函优化问题并探讨其主要性质。特别地,我们表明,当初始化的尺度在$0$到$+\infty$之间变化时,对应的路径连续插值于所谓的核机制和丰富机制之间。数值实验证实,在我们的设定中,尺度路径与优化路径的最终状态即使在超出这些极端点的范围内也表现出相似的行为。