During recent years the interest of optimization and machine learning communities in high-probability convergence of stochastic optimization methods has been growing. One of the main reasons for this is that high-probability complexity bounds are more accurate and less studied than in-expectation ones. However, SOTA high-probability non-asymptotic convergence results are derived under strong assumptions such as the boundedness of the gradient noise variance or of the objective's gradient itself. In this paper, we propose several algorithms with high-probability convergence results under less restrictive assumptions. In particular, we derive new high-probability convergence results under the assumption that the gradient/operator noise has bounded central $\alpha$-th moment for $\alpha \in (1,2]$ in the following setups: (i) smooth non-convex / Polyak-Lojasiewicz / convex / strongly convex / quasi-strongly convex minimization problems, (ii) Lipschitz / star-cocoercive and monotone / quasi-strongly monotone variational inequalities. These results justify the usage of the considered methods for solving problems that do not fit standard functional classes studied in stochastic optimization.
翻译:近年来,优化与机器学习领域对随机优化方法高概率收敛性的兴趣日益增长。主要原因是高概率复杂度界比期望界更精确且研究较少。然而,现有最优高概率非渐近收敛结果均基于强假设,如梯度噪声方差有界或目标函数梯度有界。本文在更宽松假设下提出多种具备高概率收敛性的算法。具体而言,我们在梯度/算子噪声具有有界中心α阶矩(α∈(1,2])的假设下,推导出以下设定中的新高概率收敛结果:(i) 光滑非凸/Polya-Łojasiewicz/凸/强凸/拟强凸最小化问题,(ii) Lipschitz/星形共单调且单调/拟强单调变分不等式。这些结果验证了所提方法适用于解决不符合随机优化中标准函数类的问题。