Optimal and parameter-free gradient minimization methods for convex and nonconvex optimization

We propose novel optimal and parameter-free algorithms for computing an approximate solution with small (projected) gradient norm. Specifically, for computing an approximate solution such that the norm of its (projected) gradient does not exceed $\varepsilon$, we obtain the following results: a) for the convex case, the total number of gradient evaluations is bounded by $O(1)\sqrt{L\|x_0 - x^*\|/\varepsilon}$, where $L$ is the Lipschitz constant of the gradient and $x^*$ is any optimal solution; b) for the strongly convex case, the total number of gradient evaluations is bounded by $O(1)\sqrt{L/\mu}\log(\|\nabla f(x_0)\|/\epsilon)$, where $\mu$ is the strong convexity modulus; and c) for the nonconvex case, the total number of gradient evaluations is bounded by $O(1)\sqrt{Ll}(f(x_0) - f(x^*))/\varepsilon^2$, where $l$ is the lower curvature constant. Our complexity results match the lower complexity bounds of the convex and strongly cases, and achieve the above best-known complexity bound for the nonconvex case for the first time in the literature. Our results can also be extended to problems with constraints and composite objectives. Moreover, for all the convex, strongly convex, and nonconvex cases, we propose parameter-free algorithms that do not require the input of any problem parameters or the convexity status of the problem. To the best of our knowledge, there do not exist such parameter-free methods before especially for the strongly convex and nonconvex cases. Since most regularity conditions (e.g., strong convexity and lower curvature) are imposed over a global scope, the corresponding problem parameters are notoriously difficult to estimate. However, gradient norm minimization equips us with a convenient tool to monitor the progress of algorithms and thus the ability to estimate such parameters in-situ.

翻译：我们提出了新颖的最优且无参数的算法，用于计算具有小（投影）梯度范数的近似解。具体而言，为了计算一个其（投影）梯度范数不超过 $\varepsilon$ 的近似解，我们获得了以下结果：a) 对于凸情形，梯度评估的总次数以 $O(1)\sqrt{L\|x_0 - x^*\|/\varepsilon}$ 为界，其中 $L$ 是梯度的 Lipschitz 常数，$x^*$ 是任意最优解；b) 对于强凸情形，梯度评估的总次数以 $O(1)\sqrt{L/\mu}\log(\|\nabla f(x_0)\|/\epsilon)$ 为界，其中 $\mu$ 是强凸性模量；c) 对于非凸情形，梯度评估的总次数以 $O(1)\sqrt{Ll}(f(x_0) - f(x^*))/\varepsilon^2$ 为界，其中 $l$ 是下曲率常数。我们的复杂度结果与凸情形和强凸情形的下复杂度界相匹配，并且首次在文献中实现了上述非凸情形的最佳已知复杂度界。我们的结果也可以扩展到具有约束和复合目标的问题。此外，对于所有凸、强凸和非凸情形，我们提出了无参数算法，这些算法不需要输入任何问题参数或问题的凸性状态。据我们所知，以前不存在这样的无参数方法，特别是对于强凸和非凸情形。由于大多数正则性条件（例如，强凸性和下曲率）是在全局范围内施加的，相应的问题参数 notoriously 难以估计。然而，梯度范数最小化为我们提供了一个方便的工具来监控算法的进展，从而具备了原位估计这些参数的能力。