Despite significant advances in optimizers for training, most research works use common scheduler choices like Cosine or exponential decay. In this paper, we study \emph{GreedyLR}, a novel scheduler that adaptively adjusts the learning rate during training based on the current loss. To validate the effectiveness of our proposed scheduler, we conduct experiments on several NLP, CV, and LLM tasks with up to $7B$ parameters, including both fine-tuning and pre-training experiments. The results show that our approach outperforms several state-of-the-art schedulers in terms of accuracy, speed, and convergence. We also provide a theoretical analysis of the GreedyLR algorithm, including a proof of convergence and derivation of the optimal scaling factor $F$ that maximizes the convergence rate, along with experiments to show robustness of the algorithm to realistic noisy landscapes. Our scheduler is easy to implement, computationally efficient, and could be considered a good default scheduler for training.


翻译:尽管训练优化器取得了显著进展,但大多数研究工作仍使用余弦或指数衰减等常见调度器。本文研究了GreedyLR,这是一种新颖的调度器,能够根据当前损失在训练过程中自适应调整学习率。为验证所提出调度器的有效性,我们在多个NLP、CV和LLM任务上进行了实验,参数规模高达$7B$,包括微调和预训练实验。结果表明,我们的方法在准确率、速度和收敛性方面优于多种先进调度器。我们还对GreedyLR算法进行了理论分析,包括收敛性证明和最大化收敛速率的最优缩放因子$F$的推导,并通过实验展示了算法对现实噪声场景的鲁棒性。该调度器易于实现、计算高效,可视为训练中优秀的默认调度器选择。

0
下载
关闭预览

相关内容

VIP会员
最新内容
美国当前高超音速导弹发展概述
专知会员服务
0+阅读 · 47分钟前
《高超音速武器:一项再度兴起的技术》120页slides
无人机蜂群建模与仿真方法
专知会员服务
1+阅读 · 今天14:08
澳大利亚发布《国防战略(2026年)》
专知会员服务
0+阅读 · 今天13:42
【CMU博士论文】迈向基于基础先验的 4D 感知研究
专知会员服务
0+阅读 · 今天13:46
全球高超音速武器最新发展趋势
专知会员服务
1+阅读 · 今天13:17
Top
微信扫码咨询专知VIP会员