Learning Rate Curriculum

Most curriculum learning methods require an approach to sort the data samples by difficulty, which is often cumbersome to perform. In this work, we propose a novel curriculum learning approach termed Learning Rate Curriculum (LeRaC), which leverages the use of a different learning rate for each layer of a neural network to create a data-agnostic curriculum during the initial training epochs. More specifically, LeRaC assigns higher learning rates to neural layers closer to the input, gradually decreasing the learning rates as the layers are placed farther away from the input. The learning rates increase at various paces during the first training iterations, until they all reach the same value. From this point on, the neural model is trained as usual. This creates a model-level curriculum learning strategy that does not require sorting the examples by difficulty and is compatible with any neural network, generating higher performance levels regardless of the architecture. We conduct comprehensive experiments on 12 data sets from the computer vision (CIFAR-10, CIFAR-100, Tiny ImageNet, ImageNet-200, Food-101, UTKFace, PASCAL VOC), language (BoolQ, QNLI, RTE) and audio (ESC-50, CREMA-D) domains, considering various convolutional (ResNet-18, Wide-ResNet-50, DenseNet-121, YOLOv5), recurrent (LSTM) and transformer (CvT, BERT, SepTr) architectures. We compare our approach with the conventional training regime, as well as with Curriculum by Smoothing (CBS), a state-of-the-art data-agnostic curriculum learning approach. Unlike CBS, our performance improvements over the standard training regime are consistent across all data sets and models. Furthermore, we significantly surpass CBS in terms of training time (there is no additional cost over the standard training regime for LeRaC). Our code is freely available at: https://github.com/CroitoruAlin/LeRaC.

翻译：大多数课程学习方法需要对数据样本按难度进行排序，这一过程通常较为繁琐。本文提出了一种新颖的课程学习方法，称为学习率课程（LeRaC），该方法通过在神经网络各层使用不同的学习率，在初始训练阶段创建一种与数据无关的课程。具体而言，LeRaC为靠近输入层的神经网络层分配较高的学习率，并随着层与输入层的距离增加而逐渐降低学习率。在最初的训练迭代中，各层学习率以不同速度递增，直至达到相同数值。此后，神经模型按常规方式进行训练。这形成了一种模型层面的课程学习策略，无需按难度对样本进行排序，且与任何神经网络架构兼容，能够在不同架构下实现更高的性能水平。我们在计算机视觉（CIFAR-10、CIFAR-100、Tiny ImageNet、ImageNet-200、Food-101、UTKFace、PASCAL VOC）、语言（BoolQ、QNLI、RTE）和音频（ESC-50、CREMA-D）领域的12个数据集上进行了全面实验，涵盖了多种卷积网络（ResNet-18、Wide-ResNet-50、DenseNet-121、YOLOv5）、循环网络（LSTM）和Transformer架构（CvT、BERT、SepTr）。我们将所提方法与常规训练方式以及当前最先进的与数据无关的课程学习方法——平滑课程（CBS）进行了比较。与CBS不同，我们的方法在全部数据集和模型上均能稳定超越标准训练方式的性能。此外，我们在训练时间方面显著优于CBS（LeRaC相比标准训练方式没有额外计算开销）。我们的代码已开源：https://github.com/CroitoruAlin/LeRaC。