This paper proposes a novel approach to adaptive step sizes in stochastic gradient descent (SGD) by utilizing quantities that we have identified as numerically traceable -- the Lipschitz constant for gradients and a concept of the local variance in search directions. Our findings yield a nearly hyperparameter-free algorithm for stochastic optimization, which has provable convergence properties when applied to quadratic problems and exhibits truly problem adaptive behavior on classical image classification tasks. Our framework enables the potential inclusion of a preconditioner, thereby enabling the implementation of adaptive step sizes for stochastic second-order optimization methods.
翻译:本文提出了一种新的随机梯度下降(SGD)中自适应步长的方法,通过利用我们已识别为数值可追踪的量——梯度的Lipschitz常数和搜索方向上的局部方差概念。我们的研究结果得到了一种几乎无需超参数的随机优化算法,该算法应用于二次问题时具有可证明的收敛性质,并在经典图像分类任务上展现出真正的问题自适应行为。我们的框架支持包含预条件子,从而能够实现随机二阶优化方法的自适应步长实施。