Recent works have shown that line search methods can speed up Stochastic Gradient Descent (SGD) and Adam in modern over-parameterized settings. However, existing line searches may take steps that are smaller than necessary since they require a monotone decrease of the (mini-)batch objective function. We explore nonmonotone line search methods to relax this condition and possibly accept larger step sizes. Despite the lack of a monotonic decrease, we prove the same fast rates of convergence as in the monotone case. Our experiments show that nonmonotone methods improve the speed of convergence and generalization properties of SGD/Adam even beyond the previous monotone line searches. We propose a POlyak NOnmonotone Stochastic (PoNoS) method, obtained by combining a nonmonotone line search with a Polyak initial step size. Furthermore, we develop a new resetting technique that in the majority of the iterations reduces the amount of backtracks to zero while still maintaining a large initial step size. To the best of our knowledge, a first runtime comparison shows that the epoch-wise advantage of line-search-based methods gets reflected in the overall computational time.
翻译:最近的研究表明,线搜索方法能够加速现代过参数化环境下的随机梯度下降(SGD)与Adam算法。然而,现有线搜索方法由于要求(小批量)目标函数单调下降,所采取的步长可能小于必要值。我们探索非单调线搜索方法以放宽这一条件,并可能接受更大的步长。尽管缺乏单调递减性质,我们证明了与单调情况相同的快速收敛速率。实验表明,非单调方法甚至能超越先前的单调线搜索,进一步提升SGD/Adam的收敛速度与泛化性能。我们提出一种结合非单调线搜索与Polyak初始步长的Polyak非单调随机方法(PoNoS)。此外,我们开发了一种新型重置技术,可在绝大多数迭代中将回溯次数降至零,同时维持较大的初始步长。据我们所知,首次运行时对比表明,基于线搜索方法的轮次优势可体现在整体计算时间中。