We consider (stochastic) subgradient methods for strongly convex but potentially nonsmooth non-Lipschitz optimization. We provide new equivalent dual descriptions (in the style of dual averaging) for the classic subgradient method, the proximal subgradient method, and the switching subgradient method. These equivalences enable $O(1/T)$ convergence guarantees in terms of both their classic primal gap and a not previously analyzed dual gap for strongly convex optimization. Consequently, our theory provides these classic methods with simple, optimal stopping criteria and optimality certificates at no added computational cost. Our results apply to a wide range of stepsize selections and of non-Lipschitz ill-conditioned problems where the early iterations of the subgradient method may diverge exponentially quickly (a phenomenon which, to the best of our knowledge, no prior works address). Even in the presence of such undesirable behaviors, our theory still ensures and bounds eventual convergence.
翻译:我们考虑用于强凸但可能非光滑且非Lipschitz优化的(随机)次梯度方法。针对经典次梯度方法、近端次梯度方法和切换次梯度方法,我们提供了等效的对偶描述(以对偶平均的风格)。这些等价性在强凸优化中,既保证了经典原始间隙的$O(1/T)$收敛性,也保证了此前未曾分析的对偶间隙的收敛性。因此,我们的理论为这些经典方法提供了简单、最优的停止准则和最优性证明,且无需额外计算成本。我们的结果适用于广泛的步长选择以及非Lipschitz的病态问题,其中次梯度方法的早期迭代可能以指数速度发散(据我们所知,此前尚无文献研究这一现象)。即使存在此类不良行为,我们的理论仍能保证并界定最终的收敛性。