In non-smooth stochastic optimization, we establish the non-convergence of the stochastic subgradient descent (SGD) to the critical points recently called active strict saddles by Davis and Drusvyatskiy. Such points lie on a manifold $M$ where the function $f$ has a direction of second-order negative curvature. Off this manifold, the norm of the Clarke subdifferential of $f$ is lower-bounded. We require two conditions on $f$. The first assumption is a Verdier stratification condition, which is a refinement of the popular Whitney stratification. It allows us to establish a reinforced version of the projection formula of Bolte \emph{et.al.} for Whitney stratifiable functions, and which is of independent interest. The second assumption, termed the angle condition, allows to control the distance of the iterates to $M$. When $f$ is weakly convex, our assumptions are generic. Consequently, generically in the class of definable weakly convex functions, the SGD converges to a local minimizer.
翻译:在非光滑随机优化中,我们证明了随机次梯度下降(SGD)不会收敛至Davis和Drusvyatskiy最近提出的称为严格活跃鞍点的临界点。此类点位于流形$M$上,且函数$f$在该点处具有二阶负曲率方向。在此流形之外,$f$的Clarke次微分范数存在下界。我们对$f$施加两个条件。第一个假设是Verdier分层条件,这是流行的Whitney分层条件的精细化版本。该条件使我们能够建立Bolte等人针对Whitney可分层函数提出的投影公式的强化版本,该结果本身具有独立研究价值。第二个假设称为角度条件,用于控制迭代点至流形$M$的距离。当$f$为弱凸函数时,我们的假设具有一般性。因此,在可定义弱凸函数类中,SGD通常收敛至局部极小值点。