Regularization in modern machine learning is crucial, and it can take various forms in algorithmic design: training set, model family, error function, regularization terms, and optimizations. In particular, the learning rate, which can be interpreted as a temperature-like parameter within the statistical mechanics of learning, plays a crucial role in neural network training. Indeed, many widely adopted training strategies basically just define the decay of the learning rate over time. This process can be interpreted as decreasing a temperature, using either a global learning rate (for the entire model) or a learning rate that varies for each parameter. This paper proposes TempBalance, a straightforward yet effective layer-wise learning rate method. TempBalance is based on Heavy-Tailed Self-Regularization (HT-SR) Theory, an approach which characterizes the implicit self-regularization of different layers in trained models. We demonstrate the efficacy of using HT-SR-motivated metrics to guide the scheduling and balancing of temperature across all network layers during model training, resulting in improved performance during testing. We implement TempBalance on CIFAR10, CIFAR100, SVHN, and TinyImageNet datasets using ResNets, VGGs, and WideResNets with various depths and widths. Our results show that TempBalance significantly outperforms ordinary SGD and carefully-tuned spectral norm regularization. We also show that TempBalance outperforms a number of state-of-the-art optimizers and learning rate schedulers.
翻译:正则化在现代机器学习中至关重要,其在算法设计中可表现为多种形式:训练集、模型族、误差函数、正则化项以及优化方法。特别地,学习率——可在学习统计力学中被解释为类似温度的参数——在神经网络训练中起着关键作用。实际上,许多广泛采用的训练策略本质上仅定义了学习率随时间衰减的过程。这一过程可被解读为降低温度,可采用全局学习率(适用于整个模型)或为每个参数设置可变学习率。本文提出TempBalance——一种简洁而有效的逐层学习率方法。TempBalance基于重尾自正则化(HT-SR)理论,该理论通过刻画训练模型中不同层的隐式自正则化特征。我们证明了利用HT-SR启发式指标来指导模型训练期间所有网络层的温度调度与平衡的有效性,从而在测试阶段获得更优性能。我们在CIFAR10、CIFAR100、SVHN和TinyImageNet数据集上,采用不同深度与宽度的ResNets、VGGs和WideResNets实现了TempBalance。实验结果表明,TempBalance显著优于普通SGD及精心调参的谱范数正则化方法。我们还证明TempBalance超越了多种最先进的优化器与学习率调度器。