We characterize regions of a loss surface as corridors when the continuous curves of steepest descent -- the solutions of the gradient flow -- become straight lines. We show that corridors provide insights into gradient-based optimization, since corridors are exactly the regions where gradient descent and the gradient flow follow the same trajectory, while the loss decreases linearly. As a result, inside corridors there are no implicit regularization effects or training instabilities that have been shown to occur due to the drift between gradient descent and the gradient flow. Using the loss linear decrease on corridors, we devise a learning rate adaptation scheme for gradient descent; we call this scheme Corridor Learning Rate (CLR). The CLR formulation coincides with a special case of Polyak step-size, discovered in the context of convex optimization. The Polyak step-size has been shown recently to have also good convergence properties for neural networks; we further confirm this here with results on CIFAR-10 and ImageNet.
翻译:我们将在损失曲面上,当最速下降的连续曲线——即梯度流的解——变为直线时,将该区域定义为通道。我们证明通道能够为基于梯度的优化提供洞见,因为通道正是梯度下降法与梯度流遵循相同轨迹的区域,同时损失呈线性下降。因此,在通道内部不存在因梯度下降与梯度流之间的漂移而产生的隐式正则化效应或训练不稳定性。利用通道中损失的线性下降特性,我们为梯度下降法设计了一种学习率自适应方案,并将其命名为通道学习率。该公式与在凸优化背景下发现的Polyak步长的一个特例相吻合。Polyak步长最近被证明在神经网络中也具有良好的收敛性质;我们通过在CIFAR-10和ImageNet上的实验结果进一步证实了这一点。