In this paper, we revisit the bilevel optimization problem, in which the upper-level objective function is generally nonconvex and the lower-level objective function is strongly convex. Although this type of problem has been studied extensively, it still remains an open question how to achieve an ${O}(\epsilon^{-1.5})$ sample complexity in Hessian/Jacobian-free stochastic bilevel optimization without any second-order derivative computation. To fill this gap, we propose a novel Hessian/Jacobian-free bilevel optimizer named FdeHBO, which features a simple fully single-loop structure, a projection-aided finite-difference Hessian/Jacobian-vector approximation, and momentum-based updates. Theoretically, we show that FdeHBO requires ${O}(\epsilon^{-1.5})$ iterations (each using ${O}(1)$ samples and only first-order gradient information) to find an $\epsilon$-accurate stationary point. As far as we know, this is the first Hessian/Jacobian-free method with an ${O}(\epsilon^{-1.5})$ sample complexity for nonconvex-strongly-convex stochastic bilevel optimization.
翻译:本文重新审视了双层优化问题,其中上层目标函数通常为非凸函数,而下层目标函数为强凸函数。尽管这类问题已被广泛研究,但在无二阶导数计算的海森/雅可比矩阵自由的随机双层优化中,如何实现 $O(\epsilon^{-1.5})$ 的样本复杂度仍是一个开放性问题。为填补这一空白,我们提出了一种新型的无海森/雅可比矩阵双层优化器 FdeHBO,其具有简单的全单循环结构、基于投影辅助的有限差分海森/雅可比向量近似以及基于动量的更新机制。理论上,我们证明 FdeHBO 仅需 $O(\epsilon^{-1.5})$ 次迭代(每次使用 $O(1)$ 个样本且仅利用一阶梯度信息)即可找到 $\epsilon$ 精度的稳定点。据我们所知,这是首个在非凸-强凸随机双层优化中达到 $O(\epsilon^{-1.5})$ 样本复杂度的无海森/雅可比矩阵方法。