While normalization techniques are widely used in deep learning, their theoretical understanding remains relatively limited. In this work, we establish the benefits of (generalized) weight normalization (WN) applied to the overparameterized matrix sensing problem. We prove that WN with Riemannian optimization achieves linear convergence, yielding an exponential speedup over standard methods that do not use WN. Our analysis further demonstrates that both iteration and sample complexity improve polynomially as the level of overparameterization increases. To the best of our knowledge, this work provides the first characterization of how WN leverages overparameterization for faster convergence in matrix sensing.
翻译:尽管归一化技术在深度学习中广泛应用,但其理论理解仍相对有限。本文研究了(广义)权重归一化应用于过参数化矩阵感知问题时的优势。我们证明了结合黎曼优化的权重归一化可实现线性收敛,相较于未采用权重归一化的标准方法,收敛速度呈指数级提升。进一步分析表明,随着过参数化程度的提高,迭代复杂度与样本复杂度均能获得多项式级别的改进。据我们所知,本文首次揭示了权重归一化如何利用过参数化在矩阵感知中实现更快的收敛。