We develop optimization methods which offer new trade-offs between the number of gradient and Hessian computations needed to compute the critical point of a non-convex function. We provide a method that for any twice-differentiable $f\colon \mathbb R^d \rightarrow \mathbb R$ with $L_2$-Lipschitz Hessian, input initial point with $\Delta$-bounded sub-optimality, and sufficiently small $\epsilon > 0$, outputs an $\epsilon$-critical point, i.e., a point $x$ such that $\|\nabla f(x)\| \leq \epsilon$, using $\tilde{O}(L_2^{1/4} n_H^{-1/2}\Delta\epsilon^{-9/4})$ queries to a gradient oracle and $n_H$ queries to a Hessian oracle for any positive integer $n_H$. As a consequence, we obtain an improved gradient query complexity of $\tilde{O}(d^{1/3}L_2^{1/2}\Delta\epsilon^{-3/2})$ in the case of bounded dimension and of $\tilde{O}(L_2^{3/4}\Delta^{3/2}\epsilon^{-9/4})$ in the case where we are allowed only a \emph{single} Hessian query. We obtain these results through a more general algorithm which can handle approximate Hessian computations and recovers the state-of-the-art bound of computing an $\epsilon$-critical point with $O(L_1^{1/2}L_2^{1/4}\Delta\epsilon^{-7/4})$ gradient queries provided that $f$ also has an $L_1$-Lipschitz gradient.
翻译:我们开发了优化方法,这些方法在计算非凸函数临界点所需的梯度计算次数与海森矩阵计算次数之间提供了新的权衡。对于任意具有L₂-利普希茨海森矩阵的二阶可微函数$f\colon \mathbb R^d \rightarrow \mathbb R$,给定初始点具有Δ界定的次优性,以及足够小的$\epsilon > 0$,我们提出的方法能够输出一个$\epsilon$-临界点(即满足$\|\nabla f(x)\| \leq \epsilon$的点$x$),该方法使用$\tilde{O}(L_2^{1/4} n_H^{-1/2}\Delta\epsilon^{-9/4})$次梯度查询和任意正整数$n_H$次海森矩阵查询。由此,我们在有界维度情况下获得了改进的梯度查询复杂度$\tilde{O}(d^{1/3}L_2^{1/2}\Delta\epsilon^{-3/2})$,在仅允许\emph{单次}海森矩阵查询的情况下获得了$\tilde{O}(L_2^{3/4}\Delta^{3/2}\epsilon^{-9/4})$的复杂度。这些结果通过一个更通用的算法实现,该算法能够处理近似海森矩阵计算,并在函数$f$同时具有L₁-利普希茨梯度的条件下,恢复了现有技术水平:使用$O(L_1^{1/2}L_2^{1/4}\Delta\epsilon^{-7/4})$次梯度查询即可计算$\epsilon$-临界点。