Optimal and parameter-free gradient minimization methods for convex and nonconvex optimization

We propose novel optimal and parameter-free algorithms for computing an approximate solution with small (projected) gradient norm. Specifically, for computing an approximate solution such that the norm of its (projected) gradient does not exceed $\varepsilon$, we obtain the following results: a) for the convex case, the total number of gradient evaluations is bounded by $O(1)\sqrt{L\|x_0 - x^*\|/\varepsilon}$, where $L$ is the Lipschitz constant of the gradient and $x^*$ is any optimal solution; b) for the strongly convex case, the total number of gradient evaluations is bounded by $O(1)\sqrt{L/\mu}\log(\|\nabla f(x_0)\|/\epsilon)$, where $\mu$ is the strong convexity modulus; and c) for the nonconvex case, the total number of gradient evaluations is bounded by $O(1)\sqrt{Ll}(f(x_0) - f(x^*))/\varepsilon^2$, where $l$ is the lower curvature constant. Our complexity results match the lower complexity bounds of the convex and strongly cases, and achieve the above best-known complexity bound for the nonconvex case for the first time in the literature. Moreover, for all the convex, strongly convex, and nonconvex cases, we propose parameter-free algorithms that do not require the input of any problem parameters. To the best of our knowledge, there do not exist such parameter-free methods before especially for the strongly convex and nonconvex cases. Since most regularity conditions (e.g., strong convexity and lower curvature) are imposed over a global scope, the corresponding problem parameters are notoriously difficult to estimate. However, gradient norm minimization equips us with a convenient tool to monitor the progress of algorithms and thus the ability to estimate such parameters in-situ.

翻译：我们针对计算具有小(投影)梯度范数的近似解，提出了新颖的最优且无需参数的算法。具体而言，对于计算满足(投影)梯度范数不超过$\varepsilon$的近似解，我们获得以下结果：a) 在凸情形下，总梯度评估次数被$O(1)\sqrt{L\|x_0 - x^*\|/\varepsilon}$所界定，其中$L$是梯度的Lipschitz常数，$x^*$为任意最优解；b) 在强凸情形下，总梯度评估次数被$O(1)\sqrt{L/\mu}\log(\|\nabla f(x_0)\|/\epsilon)$所界定，其中$\mu$为强凸模量；c) 在非凸情形下，总梯度评估次数被$O(1)\sqrt{Ll}(f(x_0) - f(x^*))/\varepsilon^2$所界定，其中$l$为下曲率常数。我们的复杂度结果匹配了凸与强凸情形下的下界复杂度，并在非凸情形下首次在文献中达到了上述已知最佳复杂度界。此外，针对所有凸、强凸及非凸情形，我们提出了无需输入任何问题参数的参数无关算法。据我们所知，此前尤其对于强凸与非凸情形，尚不存在此类参数无关方法。由于大多数正则性条件（如强凸性与下曲率）在全局范围内施加，相应的问题参数难以估计。然而，梯度范数极小化为我们提供了监控算法进展的便捷工具，从而能够原位估计此类参数。