Convergence of Gradient Descent with Small Initialization for Unregularized Matrix Completion

We study the problem of symmetric matrix completion, where the goal is to reconstruct a positive semidefinite matrix $\rm{X}^\star \in \mathbb{R}^{d\times d}$ of rank-$r$, parameterized by $\rm{U}\rm{U}^{\top}$, from only a subset of its observed entries. We show that the vanilla gradient descent (GD) with small initialization provably converges to the ground truth $\rm{X}^\star$ without requiring any explicit regularization. This convergence result holds true even in the over-parameterized scenario, where the true rank $r$ is unknown and conservatively over-estimated by a search rank $r'\gg r$. The existing results for this problem either require explicit regularization, a sufficiently accurate initial point, or exact knowledge of the true rank $r$. In the over-parameterized regime where $r'\geq r$, we show that, with $\widetilde\Omega(dr^9)$ observations, GD with an initial point $\|\rm{U}_0\| \leq \epsilon$ converges near-linearly to an $\epsilon$-neighborhood of $\rm{X}^\star$. Consequently, smaller initial points result in increasingly accurate solutions. Surprisingly, neither the convergence rate nor the final accuracy depends on the over-parameterized search rank $r'$, and they are only governed by the true rank $r$. In the exactly-parameterized regime where $r'=r$, we further enhance this result by proving that GD converges at a faster rate to achieve an arbitrarily small accuracy $\epsilon>0$, provided the initial point satisfies $\|\rm{U}_0\| = O(1/d)$. At the crux of our method lies a novel weakly-coupled leave-one-out analysis, which allows us to establish the global convergence of GD, extending beyond what was previously possible using the classical leave-one-out analysis.

翻译：我们研究对称矩阵补全问题，目标是从部分观测条目中重构秩为$r$的正半定矩阵$\rm{X}^\star \in \mathbb{R}^{d\times d}$（参数化为$\rm{U}\rm{U}^{\top}$）。研究表明，即使不显式引入正则化，采用小初始化的朴素梯度下降（GD）也能保证收敛到真实矩阵$\rm{X}^\star$。该收敛结果在过参数化场景中依然成立——当真实秩$r$未知且被搜索秩$r'\gg r$保守高估时。现有方法要么需要显式正则化，要么需要足够精确的初始点，要么需要真实秩$r$的精确先验知识。在过参数化情形（$r'\geq r$）下，我们证明：当观测数为$\widetilde\Omega(dr^9)$时，初始化满足$\|\rm{U}_0\| \leq \epsilon$的GD将近线性收敛到$\rm{X}^\star$的$\epsilon$-邻域。因此，更小的初始点能获得更高精度的解。令人惊讶的是，收敛速度与最终精度均不依赖于过参数化搜索秩$r'$，而仅由真实秩$r$决定。在精确参数化情形（$r'=r$）中，我们进一步改进该结果：给定初始点满足$\|\rm{U}_0\| = O(1/d)$，GD能以更快收敛速度达到任意小精度$\epsilon>0$。本方法的核心在于新型弱耦合留一法分析，使得我们能够建立GD的全局收敛性，突破了传统留一法分析的能力边界。