While normalization techniques are widely used in deep learning, their theoretical understanding remains relatively limited. In this work, we establish the benefits of (generalized) weight normalization (WN) applied to the overparameterized matrix sensing problem. We prove that WN with Riemannian optimization achieves linear convergence, yielding an exponential speedup over standard methods that do not use WN. Our analysis further demonstrates that both iteration and sample complexity improve polynomially as the level of overparameterization increases. To the best of our knowledge, this work provides the first characterization of how WN leverages overparameterization for faster convergence in matrix sensing.
翻译:尽管归一化技术在深度学习中广泛应用,但其理论理解仍相对有限。本文针对超参数化矩阵感知问题,系统论证了(广义)权重归一化(WN)的理论优势。我们证明:结合黎曼优化的WN方法可实现线性收敛,相比未采用WN的标准方法获得指数级加速。进一步分析表明,随着超参数化程度的提升,迭代复杂度和样本复杂度均获得多项式级改善。据我们所知,本研究首次从理论上阐释了WN如何利用超参数化特性加速矩阵感知问题的收敛过程。