We propose a learning rate adaptation scheme, called QLAB, for descent optimizers. We derive QLAB by optimizing the quadratic approximation of the loss function and QLAB can be combined with any optimizer who can provide the descent update direction. The computation of an adaptive learning rate with QLAB requires only computing an extra loss function value. We theoretically prove the convergence of the descent optimizers with QLAB. We demonstrate the effectiveness of QLAB in a range of optimization problems by combining with conclusively stochastic gradient descent, stochastic gradient descent with momentum, and Adam. The performance is validated on multi-layer neural networks, CNN, VGG-Net, ResNet and ShuffleNet with two datasets, MNIST and CIFAR10.
翻译:我们提出一种名为QLAB的学习率自适应方案,用于下降优化器。QLAB通过优化损失函数的二次近似推导得出,可配合任何能提供下降更新方向的优化器使用。使用QLAB计算自适应学习率仅需额外计算一次损失函数值。我们从理论上证明了采用QLAB的下降优化器的收敛性。通过与确定性随机梯度下降、带动量的随机梯度下降及Adam优化器的结合,我们在多种优化问题中验证了QLAB的有效性。基于MNIST和CIFAR10两个数据集,该性能在多层级神经网络、CNN、VGG-Net、ResNet及ShuffleNet上得到验证。