We consider stochastic optimization problems with heavy-tailed noise with structured density. For such problems, we show that it is possible to get faster rates of convergence than $\mathcal{O}(K^{-2(\alpha - 1)/\alpha})$, when the stochastic gradients have finite moments of order $\alpha \in (1, 2]$. In particular, our analysis allows the noise norm to have an unbounded expectation. To achieve these results, we stabilize stochastic gradients, using smoothed medians of means. We prove that the resulting estimates have negligible bias and controllable variance. This allows us to carefully incorporate them into clipped-SGD and clipped-SSTM and derive new high-probability complexity bounds in the considered setup.
翻译:我们考虑具有结构化密度重尾噪声的随机优化问题。对于此类问题,我们证明,当随机梯度具有阶数 $\alpha \in (1, 2]$ 的有限矩时,可以获得比 $\mathcal{O}(K^{-2(\alpha - 1)/\alpha})$ 更快的收敛速率。特别地,我们的分析允许噪声范数具有无界期望。为取得这些结果,我们使用平滑均值中位数来稳定随机梯度。我们证明所得估计量具有可忽略的偏差和可控的方差。这使我们能够谨慎地将它们融入裁剪-SGD和裁剪-SSTM中,并在所考虑的设定下推导出新的高概率复杂度界。