Based on SGD, previous works have proposed many algorithms that have improved convergence speed and generalization in stochastic optimization, such as SGDm, AdaGrad, Adam, etc. However, their convergence analysis under non-convex conditions is challenging. In this work, we propose a unified framework to address this issue. For any first-order methods, we interpret the updated direction $g_t$ as the sum of the stochastic subgradient $\nabla f_t(x_t)$ and an additional acceleration term $\frac{2|\langle v_t, \nabla f_t(x_t) \rangle|}{\|v_t\|_2^2} v_t$, thus we can discuss the convergence by analyzing $\langle v_t, \nabla f_t(x_t) \rangle$. Through our framework, we have discovered two plug-and-play acceleration methods: \textbf{Reject Accelerating} and \textbf{Random Vector Accelerating}, we theoretically demonstrate that these two methods can directly lead to an improvement in convergence rate.
翻译:基于SGD,已有研究提出了多种改进随机优化收敛速度与泛化性能的算法,如SGDm、AdaGrad、Adam等。然而,这些算法在非凸条件下的收敛分析具有挑战性。本文提出一个统一框架来解决该问题。对于任意一阶方法,我们将更新方向$g_t$解释为随机次梯度$\nabla f_t(x_t)$与额外加速项$\frac{2|\langle v_t, \nabla f_t(x_t) \rangle|}{\|v_t\|_2^2} v_t$之和,从而可通过分析$\langle v_t, \nabla f_t(x_t) \rangle$来讨论收敛性。通过该框架,我们发现两种即插即用的加速方法:\textbf{拒绝加速(Reject Accelerating)}与\textbf{随机向量加速(Random Vector Accelerating)},并从理论上证明了这两种方法能直接提升收敛速率。