Existing learning rate schedules that do not require specification of the optimization stopping step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from convex problems to large-scale deep learning problems. Our Schedule-Free approach introduces no additional hyper-parameters over standard optimizers with momentum. Our method is a direct consequence of a new theory we develop that unifies scheduling and iterate averaging. An open source implementation of our method is available (https://github.com/facebookresearch/schedule_free).
翻译:现有无需指定优化停止步数T的学习率调度方法,其性能远逊于依赖T的调度策略。我们提出一种无需预设停止时间的方法,该方法完全摒弃调度表的使用,同时在从凸优化问题到大规模深度学习问题的广泛问题族中,表现出与各类调度方法相比具有前沿竞争力的性能。我们的无调度方法相较于标准带动量的优化器未引入任何额外超参数。该方法是我们在调度策略与迭代平均化之间建立统一理论框架的直接成果。本方法的开源实现已发布(https://github.com/facebookresearch/schedule_free)。