Overparameterization is central to the success of deep learning, yet the mechanisms by which it improves optimization remain incompletely understood. We analyze weight-space symmetries in neural networks and show that overparameterization introduces additional symmetries that benefit optimization in two distinct ways. First, we prove that these symmetries act as a form of diagonal preconditioning on the Hessian, enabling the existence of better-conditioned minima within each equivalence class of functionally identical solutions. Second, we show that overparameterization increases the probability mass of global minima near typical initializations, making these favorable solutions more reachable. Teacher-student network experiments validate our theoretical predictions: as width increases, the Hessian trace decreases, condition numbers improve, and convergence accelerates. Our analysis provides a unified framework for understanding overparameterization and width growth as a geometric transformation of the loss landscape.
翻译:过参数化是深度学习成功的关键,但其改善优化的机制尚未完全阐明。本文分析了神经网络中的权值空间对称性,并证明过参数化引入了额外的对称性,这些对称性通过两种不同方式有利于优化。首先,我们证明这些对称性在Hessian矩阵上起到对角预条件的作用,使得在功能相同的解构成的每个等价类内,存在条件数更优的极小值。其次,我们证明过参数化增加了典型初始化附近全局极小值的概率质量,使这些有利解更易达。师生网络实验验证了我们的理论预测:随着网络宽度增加,Hessian迹减小,条件数改善,收敛速度加快。我们的分析为理解过参数化和宽度增长作为损失景观的几何变换提供了统一框架。