Inspired by recent research that recommends starting neural networks training with large learning rates (LRs) to achieve the best generalization, we explore this hypothesis in detail. Our study clarifies the initial LR ranges that provide optimal results for subsequent training with a small LR or weight averaging. We find that these ranges are in fact significantly narrower than generally assumed. We conduct our main experiments in a simplified setup that allows precise control of the learning rate hyperparameter and validate our key findings in a more practical setting.
翻译:受近期建议通过大学习率启动神经网络训练以获得最佳泛化性能的研究启发,我们详细探索了这一假设。本研究明确了可为后续小学习率训练或权重平均提供最优结果的初始学习率范围。我们发现,这些范围实际上比普遍认为的更窄。我们在简化设置中开展主要实验,从而精确控制学习率超参数,并在更实际的环境中验证了关键发现。