D-Adaptation is an approach to automatically setting the learning rate which asymptotically achieves the optimal rate of convergence for minimizing convex Lipschitz functions, with no back-tracking or line searches, and no additional function value or gradient evaluations per step. Our approach is the first hyper-parameter free method for this class without additional multiplicative log factors in the convergence rate. We present extensive experiments for SGD and Adam variants of our method, where the method automatically matches hand-tuned learning rates across more than a dozen diverse machine learning problems, including large-scale vision and language problems. An open-source implementation is available.
翻译:D-Adaptation是一种自动设置学习率的方法,在最小化凸Lipschitz函数时渐近达到最优收敛速率,且无需回溯或线性搜索,每一步也无需额外计算函数值或梯度。该方法首次在不引入收敛速率中附加乘法对数因子的前提下,实现了对该函数类别的超参数自由优化。我们通过SGD和Adam变体进行了广泛实验,表明该方法在十余个不同机器学习问题(包括大规模视觉与语言任务)中自动匹配了手动调整的学习率。相关开源实现已发布。