In Online Convex Optimization (OCO), when the stochastic gradient has a finite variance, many algorithms provably work and guarantee a sublinear regret. However, limited results are known if the gradient estimate has a heavy tail, i.e., the stochastic gradient only admits a finite $\mathsf{p}$-th central moment for some $\mathsf{p}\in\left(1,2\right]$. Motivated by it, this work examines different old algorithms for OCO (e.g., Online Gradient Descent) in the more challenging heavy-tailed setting. Under the standard bounded domain assumption, we establish new regrets for these classical methods without any algorithmic modification. Remarkably, these regret bounds are fully optimal in all parameters (can be achieved even without knowing $\mathsf{p}$), suggesting that OCO with heavy tails can be solved effectively without any extra operation (e.g., gradient clipping). Our new results have several applications. A particularly interesting one is the first provable and optimal convergence result for nonsmooth nonconvex optimization under heavy-tailed noise without gradient clipping. Furthermore, we explore broader settings (e.g., smooth OCO) and extend our ideas to optimistic algorithms to handle different cases simultaneously.
翻译:在在线凸优化(OCO)中,当随机梯度具有有限方差时,许多算法可证明有效并保证亚线性遗憾。然而,若梯度估计呈现重尾分布(即随机梯度仅对某些$\mathsf{p}\in\left(1,2\right]$具有有限的$\mathsf{p}$阶中心矩),目前已知结果有限。受此启发,本文考察了OCO中多种旧算法(如在线梯度下降)在更具挑战性的重尾环境下的表现。在标准有界域假设下,我们为这些经典方法建立了新的遗憾界,且无需修改算法。值得注意的是,这些遗憾界在所有参数上均完全最优(即使未知$\mathsf{p}$也能达到),表明无需额外操作(如梯度裁剪)即可有效解决重尾下的OCO问题。我们的新结果具有多项应用。尤为有趣的是,我们首次为无梯度裁剪的重尾噪声下非光滑非凸优化问题提供了可证明且最优的收敛结果。此外,我们探索了更广泛的场景(如光滑OCO),并将思想扩展到乐观算法以同时处理不同情形。