During recent years the interest of optimization and machine learning communities in high-probability convergence of stochastic optimization methods has been growing. One of the main reasons for this is that high-probability complexity bounds are more accurate and less studied than in-expectation ones. However, SOTA high-probability non-asymptotic convergence results are derived under strong assumptions such as the boundedness of the gradient noise variance or of the objective's gradient itself. In this paper, we propose several algorithms with high-probability convergence results under less restrictive assumptions. In particular, we derive new high-probability convergence results under the assumption that the gradient/operator noise has bounded central $\alpha$-th moment for $\alpha \in (1,2]$ in the following setups: (i) smooth non-convex / Polyak-Lojasiewicz / convex / strongly convex / quasi-strongly convex minimization problems, (ii) Lipschitz / star-cocoercive and monotone / quasi-strongly monotone variational inequalities. These results justify the usage of the considered methods for solving problems that do not fit standard functional classes studied in stochastic optimization.
翻译:近年来,优化与机器学习社区对随机优化方法高概率收敛性的兴趣日益增长。主要原因在于,相较于期望意义下的复杂度界,高概率复杂度界更为精确且研究尚不充分。然而,现有最先进的高概率非渐近收敛结果均建立在强假设之上,例如梯度噪声方差有界或目标函数梯度本身有界。本文中,我们提出了若干算法,在更宽松的假设下获得了高概率收敛结果。具体而言,我们在梯度/算子噪声具有有界中心α阶矩(α∈(1,2])的假设下,针对以下情境推导了新的高概率收敛结果:(i)光滑非凸/ Polyak-Lojasiewicz / 凸/ 强凸/ 拟强凸最小化问题;(ii)Lipschitz / 星型协强制单调/ 拟强单调变分不等式。这些结果证明了所提方法在求解不适合随机优化中标准函数类的问题时的有效性。