We study the dynamics and implicit bias of gradient flow (GF) on univariate ReLU neural networks with a single hidden layer in a binary classification setting. We show that when the labels are determined by the sign of a target network with $r$ neurons, with high probability over the initialization of the network and the sampling of the dataset, GF converges in direction (suitably defined) to a network achieving perfect training accuracy and having at most $\mathcal{O}(r)$ linear regions, implying a generalization bound. Unlike many other results in the literature, under an additional assumption on the distribution of the data, our result holds even for mild over-parameterization, where the width is $\tilde{\mathcal{O}}(r)$ and independent of the sample size.
翻译:我们研究了在二元分类设置下,具有单隐藏层的单变量ReLU神经网络的梯度流(GF)动力学及其隐式偏差。我们证明,当标签由具有$r$个神经元的靶标网络的符号确定时,在网络初始化和数据集采样的高概率下,GF在方向(适当定义)上收敛到一个达到完美训练精度且最多具有$\mathcal{O}(r)$个线性区域的网络,从而得到泛化界。与文献中许多结果不同,在关于数据分布的额外假设下,即便在适度过参数化的情况下——宽度为$\tilde{\mathcal{O}}(r)$且与样本量无关——我们的结果仍然成立。