Studying the interplay between the geometry of the loss landscape and the optimization trajectories of simple neural networks is a fundamental step for understanding their behavior in more complex settings. This paper reveals the presence of topological obstruction in the loss landscape of shallow ReLU neural networks trained using gradient flow. We discuss how the homogeneous nature of the ReLU activation function constrains the training trajectories to lie on a product of quadric hypersurfaces whose shape depends on the particular initialization of the network's parameters. When the neural network's output is a single scalar, we prove that these quadrics can have multiple connected components, limiting the set of reachable parameters during training. We analytically compute the number of these components and discuss the possibility of mapping one to the other through neuron rescaling and permutation. In this simple setting, we find that the non-connectedness results in a topological obstruction, which, depending on the initialization, can make the global optimum unreachable. We validate this result with numerical experiments.
翻译:研究简单神经网络损失函数景观的几何特性与优化轨迹之间的相互作用,是理解其在更复杂场景中行为的基础性步骤。本文揭示了使用梯度流训练的浅层ReLU神经网络损失函数景观中存在拓扑障碍。我们讨论了ReLU激活函数的齐次性如何将训练轨迹约束在二次超曲面的乘积上,这些曲面的形状取决于网络参数的具体初始化方式。当神经网络输出为单标量时,我们证明这些二次曲面可能具有多个连通分支,从而限制了训练过程中可到达的参数集合。我们解析计算了这些分支的数量,并讨论了通过神经元重缩放与置换实现分支间映射的可能性。在此简单设定下,我们发现非连通性会导致拓扑障碍,根据初始化方式的不同,该障碍可能使全局最优解无法到达。我们通过数值实验验证了这一结论。