D-Adaptation is an approach to automatically setting the learning rate which asymptotically achieves the optimal rate of convergence for minimizing convex Lipschitz functions, with no back-tracking or line searches, and no additional function value or gradient evaluations per step. Our approach is the first hyper-parameter free method for this class without additional multiplicative log factors in the convergence rate. We present extensive experiments for SGD and Adam variants of our method, where the method automatically matches hand-tuned learning rates across more than a dozen diverse machine learning problems, including large-scale vision and language problems. An open-source implementation is available.
翻译:摘要:D-Adaptation是一种自动设置学习率的方法,该方法在无回溯、无线性搜索、且无需额外函数值或梯度评估的每一步中,渐近地达到了凸Lipschitz函数最小化的最优收敛速率。我们的方法是针对该类问题首种无需超参数且收敛速率中不包含额外对数因子的方案。我们针对该方法在SGD和Adam变体上开展了广泛实验,结果显示该方法在超过十二种不同的机器学习问题(包括大规模视觉与语言问题)中自动匹配了人工调优的学习率。该方法的开源实现现已提供。