Existing learning rate schedules that do not require specification of the optimization stopping step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from convex problems to large-scale deep learning problems. Our Schedule-Free approach introduces no additional hyper-parameters over standard optimizers with momentum. Our method is a direct consequence of a new theory we develop that unifies scheduling and iterate averaging. An open source implementation of our method is available (https://github.com/facebookresearch/schedule_free).
翻译:现有的无需指定优化停止步数T的学习率调度方案,其性能远逊于依赖T的调度方案。我们提出了一种方法,通过完全摒弃调度方案的使用,从而避免了对这一停止时间的需求,同时在从凸优化问题到大规模深度学习问题的广泛问题族中,展现出与各类调度方案相比具有先进水平的性能。我们的"无调度"方法相较于带有动量的标准优化器,未引入任何额外的超参数。我们的方法是基于我们新开发的一种统一了调度与迭代平均的理论的直接结果。我们方法的开源实现已发布(https://github.com/facebookresearch/schedule_free)。