Dynamic Learning Rate Scheduling based on Loss Changes Leads to Faster Convergence

Despite significant advances in optimizers for training, most research works use common scheduler choices like Cosine or exponential decay. In this paper, we study \emph{GreedyLR}, a novel scheduler that adaptively adjusts the learning rate during training based on the current loss. To validate the effectiveness of our proposed scheduler, we conduct experiments on several NLP, CV, and LLM tasks with up to $7B$ parameters, including both fine-tuning and pre-training experiments. The results show that our approach outperforms several state-of-the-art schedulers in terms of accuracy, speed, and convergence. We also provide a theoretical analysis of the GreedyLR algorithm, including a proof of convergence and derivation of the optimal scaling factor $F$ that maximizes the convergence rate, along with experiments to show robustness of the algorithm to realistic noisy landscapes. Our scheduler is easy to implement, computationally efficient, and could be considered a good default scheduler for training.

翻译：尽管训练优化器取得了显著进展，但大多数研究工作仍使用余弦或指数衰减等常见调度器。本文研究了GreedyLR，这是一种新颖的调度器，能够根据当前损失在训练过程中自适应调整学习率。为验证所提出调度器的有效性，我们在多个NLP、CV和LLM任务上进行了实验，参数规模高达$7B$，包括微调和预训练实验。结果表明，我们的方法在准确率、速度和收敛性方面优于多种先进调度器。我们还对GreedyLR算法进行了理论分析，包括收敛性证明和最大化收敛速率的最优缩放因子$F$的推导，并通过实验展示了算法对现实噪声场景的鲁棒性。该调度器易于实现、计算高效，可视为训练中优秀的默认调度器选择。