Stochastic Gradient Descent (SGD) often slows in the late stage of training due to anisotropic curvature and gradient noise. We analyze preconditioned SGD in the geometry induced by a symmetric positive definite matrix $\mathbf{M}$, deriving bounds in which both the convergence rate and the stochastic noise floor are governed by $\mathbf{M}$-dependent quantities: the rate through an effective condition number in the $\mathbf{M}$-metric, and the floor through the product of that condition number and the preconditioned noise level. For nonconvex objectives, we establish a preconditioner-dependent basin-stability guarantee: when smoothness and basin size are measured in the $\mathbf{M}$-norm, the probability that the iterates remain in a well-behaved local region admits an explicit lower bound. This perspective is particularly relevant in Scientific Machine Learning (SciML), where achieving small training loss under stochastic updates is closely tied to physical fidelity, numerical stability, and constraint satisfaction. The framework applies to both diagonal/adaptive and curvature-aware preconditioners and yields a simple design principle: choose $\mathbf{M}$ to improve local conditioning while attenuating noise. Experiments on a quadratic diagnostic and three SciML benchmarks validate the predicted rate-floor behavior.
翻译:随机梯度下降法(SGD)在训练后期常因各向异性曲率和梯度噪声而减速。我们在对称正定矩阵$\mathbf{M}$诱导的几何中分析预条件SGD,导出收敛率和随机噪声基底均受$\mathbf{M}$依赖量约束的界:收敛率通过$\mathbf{M}$度量中的有效条件数控制,噪声基底则通过该条件数与预条件噪声水平的乘积决定。针对非凸目标函数,我们建立了依赖于预条件子的盆地稳定性保证:当光滑度和盆地尺寸以$\mathbf{M}$范数度量时,迭代过程保持在良性局部区域内的概率存在显式下界。该视角对科学机器学习(SciML)尤其重要,因为在该领域中,随机更新下实现小训练损失与物理保真度、数值稳定性和约束满足紧密相关。该框架适用于对角/自适应型及曲率感知型预条件子,并导出一个简单设计原则:选择$\mathbf{M}$以改善局部条件同时衰减噪声。在二次型诊断实验和三个SciML基准测试上的结果验证了所预测的率-基底行为。