Gradient Descent (GD) has been proven effective in solving various matrix factorization problems. However, its optimization behavior with large initial values remains less understood. To address this gap, this paper presents a novel theoretical framework for examining the convergence trajectory of GD with a large initialization. The framework is grounded in signal-to-noise ratio concepts and inductive arguments. The results uncover an implicit incremental learning phenomenon in GD and offer a deeper understanding of its performance in large initialization scenarios.
翻译:梯度下降(GD)已被证明在解决各类矩阵分解问题中具有显著效果。然而,其在大初始值条件下的优化行为仍缺乏深入理解。为填补这一研究空白,本文提出了一种新颖的理论框架,用于分析大初始化条件下梯度下降的收敛轨迹。该框架基于信噪比概念与归纳论证方法构建。研究结果揭示了梯度下降中存在的隐式增量学习现象,并为理解其在大初始化场景下的性能表现提供了更深刻的理论依据。