We introduce a novel approach for analyzing the training dynamics of ReLU networks by examining the characteristic activation boundaries of individual ReLU neurons. Our proposed analysis reveals a critical instability in common neural network parameterizations and normalizations during stochastic optimization, which impedes fast convergence and hurts generalization performance. Addressing this, we propose Geometric Parameterization (GmP), a novel neural network parameterization technique that effectively separates the radial and angular components of weights in the hyperspherical coordinate system. We show theoretically that GmP resolves the aforementioned instability issue. We report empirical results on various models and benchmarks to verify GmP's advantages of optimization stability, convergence speed and generalization performance.
翻译:我们提出了一种新颖的方法,通过分析单个ReLU神经元的特征激活边界来研究ReLU网络的训练动态。我们提出的分析揭示了随机优化过程中常见神经网络参数化与归一化方法存在的一个关键不稳定性问题,该问题会阻碍快速收敛并损害泛化性能。针对此问题,我们提出了几何参数化(GmP)——一种新颖的神经网络参数化技术,该技术能在超球面坐标系中有效分离权重的径向分量与角向分量。我们从理论上证明了GmP能够解决上述不稳定性问题。我们在多种模型与基准测试上报告了实证结果,验证了GmP在优化稳定性、收敛速度与泛化性能方面的优势。