We consider the problem of unconstrained minimization of finite sums of functions. We propose a simple, yet, practical way to incorporate variance reduction techniques into SignSGD, guaranteeing convergence that is similar to the full sign gradient descent. The core idea is first instantiated on the problem of minimizing sums of convex and Lipschitz functions and is then extended to the smooth case via variance reduction. Our analysis is elementary and much simpler than the typical proof for variance reduction methods. We show that for smooth functions our method gives $\mathcal{O}(1 / \sqrt{T})$ rate for expected norm of the gradient and $\mathcal{O}(1/T)$ rate in the case of smooth convex functions, recovering convergence results of deterministic methods, while preserving computational advantages of SignSGD.
翻译:我们考虑有限个函数之和的无约束最小化问题。提出一种简单且实用的方法,将方差缩减技术融入SignSGD,确保收敛性类似于完整符号梯度下降。核心思想首先在凸函数与Lipschitz函数之和的最小化问题上实例化,随后通过方差缩减扩展到光滑情形。我们的分析具有基础性,且比方差缩减方法的典型证明简单得多。研究表明,对于光滑函数,该方法在梯度期望范数上达到$\mathcal{O}(1/\sqrt{T})$的收敛速率;对于光滑凸函数则达到$\mathcal{O}(1/T)$的速率,在恢复确定性方法收敛结果的同时,保留了SignSGD的计算优势。