Learning rate schedulers have shown great success in speeding up the convergence of learning algorithms in practice. However, their convergence to a minimum has not been proven theoretically. This difficulty mainly arises from the fact that, while traditional convergence analysis prescribes to monotonically decreasing (or constant) learning rates, schedulers opt for rates that often increase and decrease through the training epochs. In this work, we aim to bridge the gap by proposing a probabilistic learning rate scheduler (PLRS), that does not conform to the monotonically decreasing condition, with provable convergence guarantees. In addition to providing detailed convergence proofs, we also show experimental results where the proposed PLRS performs competitively as other state-of-the-art learning rate schedulers across a variety of datasets and architectures.
翻译:学习率调度器在实践中已展现出显著加速学习算法收敛的优势。然而,其收敛至最小值的理论证明尚未完成。这一困难主要源于传统收敛分析要求学习率单调递减(或保持恒定),而调度器选择的学习率在训练周期中常呈现增减波动。本研究旨在弥合这一差距,提出一种不满足单调递减条件的概率学习率调度器(PLRS),并给出可证明的收敛保证。除了提供详细的收敛证明外,我们还通过实验表明,所提出的PLRS在多种数据集和架构上与其他先进学习率调度器相比具有竞争力。