Continuous-time approximation of Stochastic Gradient Descent (SGD) is a crucial tool to study its escaping behaviors from stationary points. However, existing stochastic differential equation (SDE) models fail to fully capture these behaviors, even for simple quadratic objectives. Built on a novel stochastic backward error analysis framework, we derive the Hessian-Aware Stochastic Modified Equation (HA-SME), an SDE that incorporates Hessian information of the objective function into both its drift and diffusion terms. Our analysis shows that HA-SME matches the order-best approximation error guarantee among existing SDE models in the literature, while achieving a significantly reduced dependence on the smoothness parameter of the objective. Further, for quadratic objectives, under mild conditions, HA-SME is proved to be the first SDE model that recovers exactly the SGD dynamics in the distributional sense. Consequently, when the local landscape near a stationary point can be approximated by quadratics, HA-SME is expected to accurately predict the local escaping behaviors of SGD.
翻译:随机梯度下降(SGD)的连续时间近似是研究其逃离驻点行为的关键工具。然而,现有的随机微分方程(SDE)模型未能完全捕捉这些行为,即使对于简单的二次目标函数也是如此。基于一种新颖的随机后向误差分析框架,我们推导出Hessian感知随机修正方程(HA-SME),这是一种将目标函数的Hessian信息同时纳入其漂移项和扩散项的SDE。我们的分析表明,HA-SME在现有文献中的SDE模型中达到了最优阶的近似误差保证,同时显著降低了对目标函数光滑度参数的依赖。此外,对于二次目标函数,在温和条件下,HA-SME被证明是首个在分布意义上精确恢复SGD动态的SDE模型。因此,当驻点附近的局部景观可以用二次函数近似时,HA-SME有望准确预测SGD的局部逃离行为。