We examine gradient descent in matrix factorization and show that under large step sizes the parameter space develops a fractal structure. We derive the exact critical step size for convergence in scalar-vector factorization and show that near criticality the selected minimizer depends sensitively on the initialization. Moreover, we show that adding regularization amplifies this sensitivity, generating a fractal boundary between initializations that converge and those that diverge. The analysis extends to general matrix factorization with orthogonal initialization. Our findings reveal that near-critical step sizes induce a chaotic regime of gradient descent where the long-term dynamics are unpredictable and there are no simple implicit biases, such as towards balancedness, minimum norm, or flatness.
翻译:我们研究了矩阵分解中的梯度下降方法,并证明在大步长条件下参数空间会形成分形结构。我们推导出标量-向量分解中收敛的精确临界步长,并证明在临界值附近所选极小值对初始化条件具有高度敏感性。此外,我们证明添加正则化会放大这种敏感性,从而在收敛与发散的初始化条件之间产生分形边界。该分析可推广至采用正交初始化的通用矩阵分解。我们的研究结果表明,近临界步长会引发梯度下降的混沌状态,其中长期动态具有不可预测性,且不存在简单的隐式偏好(例如趋向平衡性、最小范数或平坦性)。