Adaptive methods are extremely popular in machine learning as they make learning rate tuning less expensive. This paper introduces a novel optimization algorithm named KATE, which presents a scale-invariant adaptation of the well-known AdaGrad algorithm. We prove the scale-invariance of KATE for the case of Generalized Linear Models. Moreover, for general smooth non-convex problems, we establish a convergence rate of $O \left(\frac{\log T}{\sqrt{T}} \right)$ for KATE, matching the best-known ones for AdaGrad and Adam. We also compare KATE to other state-of-the-art adaptive algorithms Adam and AdaGrad in numerical experiments with different problems, including complex machine learning tasks like image classification and text classification on real data. The results indicate that KATE consistently outperforms AdaGrad and matches/surpasses the performance of Adam in all considered scenarios.
翻译:自适应方法在机器学习中极为流行,因为它们降低了学习率调优的成本。本文提出了一种名为KATE的新型优化算法,该算法对著名的AdaGrad算法进行了尺度不变的改进。我们证明了KATE在广义线性模型情况下的尺度不变性。此外,对于一般光滑非凸问题,我们为KATE建立了$O \left(\frac{\log T}{\sqrt{T}} \right)$的收敛速率,与AdaGrad和Adam的最佳已知收敛速率相匹配。我们还在不同问题的数值实验中,包括图像分类和真实数据上的文本分类等复杂机器学习任务,将KATE与其他最先进的自适应算法Adam和AdaGrad进行了比较。结果表明,在所有考虑的场景中,KATE始终优于AdaGrad,并达到或超越了Adam的性能。