It is believed that Gradient Descent (GD) induces an implicit bias towards good generalization in training machine learning models. This paper provides a fine-grained analysis of the dynamics of GD for the matrix sensing problem, whose goal is to recover a low-rank ground-truth matrix from near-isotropic linear measurements. It is shown that GD with small initialization behaves similarly to the greedy low-rank learning heuristics (Li et al., 2020) and follows an incremental learning procedure (Gissin et al., 2019): GD sequentially learns solutions with increasing ranks until it recovers the ground truth matrix. Compared to existing works which only analyze the first learning phase for rank-1 solutions, our result provides characterizations for the whole learning process. Moreover, besides the over-parameterized regime that many prior works focused on, our analysis of the incremental learning procedure also applies to the under-parameterized regime. Finally, we conduct numerical experiments to confirm our theoretical findings.
翻译:普遍认为,梯度下降在训练机器学习模型时会隐式地偏向良好的泛化能力。本文针对矩阵传感问题,对梯度下降的动力学过程进行了细粒度分析,其目标是从近各向同性的线性测量中恢复低秩真实矩阵。研究表明,小初始化的梯度下降行为类似于贪婪低秩学习启发式算法(Li 等人,2020),并遵循增量学习过程(Gissin 等人,2019):梯度下降依次学习秩递增的解,直到恢复真实矩阵。与现有仅分析秩-1解的第一学习阶段的工作相比,我们的结果刻画了整个学习过程的特征。此外,除了许多先前工作关注的过参数化区域,我们对增量学习过程的分析同样适用于欠参数化区域。最后,我们通过数值实验验证了理论发现。