Neural networks with a large number of parameters often do not overfit, owing to implicit regularization that favors \lq good\rq{} networks. Other related and puzzling phenomena include properties of flat minima, saddle-to-saddle dynamics, and neuron alignment. To investigate these phenomena, we study the local geometry of deep ReLU neural networks. We show that, for a fixed architecture, as the weights vary, the image of a sample $X$ forms a set whose local dimension changes. The parameter space is partitioned into regions where this local dimension remains constant. The local dimension is invariant under the natural symmetries of ReLU networks (i.e., positive rescalings and neuron permutations). We establish then that the network's geometry induces a regularization, with the local dimension serving as a key measure of regularity. Moreover, we relate the local dimension to a new notion of flatness of minima and to saddle-to-saddle dynamics. For shallow networks, we also show that the local dimension is connected to the number of linear regions perceived by $X$, offering insight into the effects of regularization. This is further supported by experiments and linked to neuron alignment. Our analysis offers, for the first time, a simple and unified geometric explanation that applies to all learning contexts for these phenomena, which are usually studied in isolation. Finally, we explore the practical computation of the local dimension and present experiments on the MNIST dataset, which highlight geometry-induced regularization in this setting.
翻译:具有大量参数的神经网络通常不会过拟合,这归因于偏好“良好”网络的隐式正则化。其他相关且令人困惑的现象包括平坦最小值的性质、鞍点到鞍点的动力学以及神经元对齐。为了研究这些现象,我们探究了深度ReLU神经网络的局部几何结构。我们证明,对于固定架构,随着权重的变化,样本$X$的像形成一个局部维度变化的集合。参数空间被划分为局部维度保持不变的区域。该局部维度在ReLU网络的自然对称性(即正缩放和神经元置换)下保持不变。我们随后确立网络的几何结构诱导了一种正则化,其中局部维度作为规律性的关键度量。此外,我们将局部维度与一种新的最小值平坦度概念以及鞍点到鞍点的动力学联系起来。对于浅层网络,我们还证明了局部维度与$X$感知的线性区域数量相关,从而为理解正则化的效应提供了见解。这进一步得到了实验的支持,并与神经元对齐相关联。我们的分析首次为这些通常被孤立研究的现象提供了一个适用于所有学习场景的简单且统一的几何解释。最后,我们探讨了局部维度的实际计算,并在MNIST数据集上进行了实验,这些实验突显了该场景下几何诱导的正则化。