We introduce a novel dynamic learning-rate scheduling scheme grounded in theory with the goal of simplifying the manual and time-consuming tuning of schedules in practice. Our approach is based on estimating the locally-optimal stepsize, guaranteeing maximal descent in the direction of the stochastic gradient of the current step. We first establish theoretical convergence bounds for our method within the context of smooth non-convex stochastic optimization, matching state-of-the-art bounds while only assuming knowledge of the smoothness parameter. We then present a practical implementation of our algorithm and conduct systematic experiments across diverse datasets and optimization algorithms, comparing our scheme with existing state-of-the-art learning-rate schedulers. Our findings indicate that our method needs minimal tuning when compared to existing approaches, removing the need for auxiliary manual schedules and warm-up phases and achieving comparable performance with drastically reduced parameter tuning.
翻译:我们提出了一种基于理论的新型动态学习率调度方案,旨在简化实际中繁琐且耗时的手动调度调优过程。该方法通过估计局部最优步长,确保在当前步的随机梯度方向上实现最大下降。我们首先在光滑非凸随机优化的理论框架下,建立了该方法的收敛性界,在仅需已知光滑性参数的条件下达到了与现有最优方法相匹配的收敛保证。随后,我们给出了该算法的实用实现,并在多种数据集和优化算法上开展了系统性实验,将所提方案与现有最优学习率调度器进行了对比。实验结果表明,与现有方法相比,本方法几乎无需额外调参,无需辅助手工调度表格和预热阶段,在显著减少参数调优工作量的同时实现了可比的性能。