We propose $\textsf{ScaledGD($\lambda$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned. Using overparametrized factor representations, $\textsf{ScaledGD($\lambda$)}$ starts from a small random initialization, and proceeds by gradient descent with a specific form of damped preconditioning to combat bad curvatures induced by overparameterization and ill-conditioning. At the expense of light computational overhead incurred by preconditioners, $\textsf{ScaledGD($\lambda$)}$ is remarkably robust to ill-conditioning compared to vanilla gradient descent ($\textsf{GD}$) even with overprameterization. Specifically, we show that, under the Gaussian design, $\textsf{ScaledGD($\lambda$)}$ converges to the true low-rank matrix at a constant linear rate after a small number of iterations that scales only logarithmically with respect to the condition number and the problem dimension. This significantly improves over the convergence rate of vanilla $\textsf{GD}$ which suffers from a polynomial dependency on the condition number. Our work provides evidence on the power of preconditioning in accelerating the convergence without hurting generalization in overparameterized learning.
翻译:我们提出$\textsf{ScaledGD($\lambda$)}$,一种预条件梯度下降方法,用于处理真实秩未知且矩阵可能病态的低秩矩阵感知问题。通过使用过参数化的因子表示,$\textsf{ScaledGD($\lambda$)}$从小的随机初始化开始,并采用特定形式的阻尼预条件进行梯度下降,以对抗由过参数化和病态性引起的坏曲率。尽管预条件器带来轻微计算开销,但与即使采用过参数化的普通梯度下降($\textsf{GD}$)相比,$\textsf{ScaledGD($\lambda$)}$对病态性表现出显著的鲁棒性。具体而言,我们表明,在高斯设计下,$\textsf{ScaledGD($\lambda$)}$在经过仅与条件数和问题维度成对数关系的小量迭代后,以恒定线性速率收敛到真实低秩矩阵。这显著优于普通$\textsf{GD}$的收敛速率,后者依赖于条件数的多项式关系。我们的工作为预处理在加速收敛且不损害过参数化学习泛化能力方面的力量提供了证据。