We prove that, for the fundamental regression task of learning a single neuron, training a one-hidden layer ReLU network of any width by gradient flow from a small initialisation converges to zero loss and is implicitly biased to minimise the rank of network parameters. By assuming that the training points are correlated with the teacher neuron, we complement previous work that considered orthogonal datasets. Our results are based on a detailed non-asymptotic analysis of the dynamics of each hidden neuron throughout the training. We also show and characterise a surprising distinction in this setting between interpolator networks of minimal rank and those of minimal Euclidean norm. Finally we perform a range of numerical experiments, which corroborate our theoretical findings.
翻译:摘要:我们证明,在学习单个神经元这一基本回归任务中,通过从较小初始化出发的梯度流训练任意宽度单隐层ReLU网络,可收敛至零损失,且隐式偏向于最小化网络参数的秩。通过假设训练点与教师神经元相关,我们补充了先前研究正交数据集的工作。我们的结果基于对训练过程中每个隐层神经元动力学的详细非渐近分析。我们还揭示并刻画了该场景下最小秩插值网络与最小欧几里得范数插值网络之间的显著差异。最后,我们开展了一系列数值实验,验证了我们的理论发现。