This paper introduces PROMISE ($\textbf{Pr}$econditioned Stochastic $\textbf{O}$ptimization $\textbf{M}$ethods by $\textbf{I}$ncorporating $\textbf{S}$calable Curvature $\textbf{E}$stimates), a suite of sketching-based preconditioned stochastic gradient algorithms for solving large-scale convex optimization problems arising in machine learning. PROMISE includes preconditioned versions of SVRG, SAGA, and Katyusha; each algorithm comes with a strong theoretical analysis and effective default hyperparameter values. In contrast, traditional stochastic gradient methods require careful hyperparameter tuning to succeed, and degrade in the presence of ill-conditioning, a ubiquitous phenomenon in machine learning. Empirically, we verify the superiority of the proposed algorithms by showing that, using default hyperparameter values, they outperform or match popular tuned stochastic gradient optimizers on a test bed of $51$ ridge and logistic regression problems assembled from benchmark machine learning repositories. On the theoretical side, this paper introduces the notion of quadratic regularity in order to establish linear convergence of all proposed methods even when the preconditioner is updated infrequently. The speed of linear convergence is determined by the quadratic regularity ratio, which often provides a tighter bound on the convergence rate compared to the condition number, both in theory and in practice, and explains the fast global linear convergence of the proposed methods.
翻译:本文介绍PROMISE($\textbf{Pr}$econditioned Stochastic $\textbf{O}$ptimization $\textbf{M}$ethods by $\textbf{I}$ncorporating $\textbf{S}$calable Curvature $\textbf{E}$stimates),这是一套基于草图方法的预条件随机梯度算法,用于求解机器学习中大规模凸优化问题。PROMISE包含SVRG、SAGA和Katyusha的预条件版本;每种算法均配有严格的理论分析和有效的默认超参数值。相比之下,传统随机梯度方法需要仔细调整超参数才能成功,并且会在病态问题(机器学习中普遍存在的现象)中性能下降。在实验方面,我们通过展示在从基准机器学习库中整理的51个岭回归和逻辑回归问题测试集上,使用默认超参数值时,所提出的算法优于或匹配流行的调优随机梯度优化器,验证了所提算法的优越性。在理论方面,本文引入二次正则性概念,以建立所有提出方法的线性收敛性,即使预条件器更新不频繁时也成立。线性收敛速度由二次正则性比率决定,该比率在理论和实践中通常比条件数提供更紧的收敛速率界,并解释了所提方法快速全局线性收敛的原因。