We study matrix sensing, which is the problem of reconstructing a low-rank matrix from a few linear measurements. It can be formulated as an overparameterized regression problem, which can be solved by factorized gradient descent when starting from a small random initialization. Linear neural networks, and in particular matrix sensing by factorized gradient descent, serve as prototypical models of non-convex problems in modern machine learning, where complex phenomena can be disentangled and studied in detail. Much research has been devoted to studying special cases of asymmetric matrix sensing, such as asymmetric matrix factorization and symmetric positive semi-definite matrix sensing. Our key contribution is introducing a continuous differential equation that we call the $\textit{perturbed gradient flow}$. We prove that the perturbed gradient flow converges quickly to the true target matrix whenever the perturbation is sufficiently bounded. The dynamics of gradient descent for matrix sensing can be reduced to this formulation, yielding a novel proof of asymmetric matrix sensing with factorized gradient descent. Compared to directly analyzing the dynamics of gradient descent, the continuous formulation allows bounding key quantities by considering their derivatives, often simplifying the proofs. We believe the general proof technique may prove useful in other settings as well.
翻译:我们研究矩阵感知问题,即从少量线性测量中重建低秩矩阵。该问题可表述为过参数化回归问题,当从小的随机初始化开始时,可通过因子化梯度下降求解。线性神经网络,尤其是通过因子化梯度下降实现的矩阵感知,是现代机器学习中非凸问题的典型模型,可借此分离并详细研究复杂现象。大量研究聚焦于非对称矩阵感知的特例,例如非对称矩阵分解和对称半正定矩阵感知。我们的关键贡献在于引入一种连续的微分方程,称之为$\textit{扰动梯度流}$。我们证明,只要扰动足够有界,扰动梯度流能快速收敛至真实目标矩阵。矩阵感知中梯度下降的动力学可归结为此形式,从而为基于因子化梯度下降的非对称矩阵感知提供新的证明。与直接分析梯度下降动力学相比,连续形式允许通过考虑关键量的导数来对其进行有界化处理,从而简化证明。我们相信,该通用证明技术在其他场景中也可能具有实用价值。