We study the problem of finding an $\epsilon$-first-order stationary point (FOSP) of a smooth function, given access only to gradient information. The best-known gradient query complexity for this task, assuming both the gradient and Hessian of the objective function are Lipschitz continuous, is ${O}(\epsilon^{-7/4})$. In this work, we propose a method with a gradient complexity of ${O}(d^{1/4}\epsilon^{-13/8})$, where $d$ is the problem dimension, leading to an improved complexity when $d = {O}(\epsilon^{-1/2})$. To achieve this result, we design an optimization algorithm that, underneath, involves solving two online learning problems. Specifically, we first reformulate the task of finding a stationary point for a nonconvex problem as minimizing the regret in an online convex optimization problem, where the loss is determined by the gradient of the objective function. Then, we introduce a novel optimistic quasi-Newton method to solve this online learning problem, with the Hessian approximation update itself framed as an online learning problem in the space of matrices. Beyond improving the complexity bound for achieving an $\epsilon$-FOSP using a gradient oracle, our result provides the first guarantee suggesting that quasi-Newton methods can potentially outperform gradient descent-type methods in nonconvex settings.
翻译:我们研究在仅能获取梯度信息的情况下,寻找平滑函数$\epsilon$一阶稳定点(FOSP)的问题。在目标函数的梯度和Hessian矩阵均满足Lipschitz连续性的假设下,该任务当前已知的最优梯度查询复杂度为${O}(\epsilon^{-7/4})$。本文提出一种梯度复杂度为${O}(d^{1/4}\epsilon^{-13/8})$的方法,其中$d$为问题维度,当$d = {O}(\epsilon^{-1/2})$时可获得更优的复杂度。为实现这一结果,我们设计了一种底层涉及求解两个在线学习问题的优化算法。具体而言,我们首先将非凸问题中寻找稳定点的任务重新表述为最小化在线凸优化问题中的遗憾值,其中损失函数由目标函数的梯度决定。随后,我们提出一种新颖的乐观拟牛顿法来解决该在线学习问题,并将Hessian近似更新本身构建为矩阵空间中的在线学习问题。除了在使用梯度预言机实现$\epsilon$-FOSP方面改进了复杂度界限外,我们的研究首次从理论上表明,拟牛顿法在非凸优化场景中可能超越梯度下降类方法。