Bilevel optimization has various applications such as hyper-parameter optimization and meta-learning. Designing theoretically efficient algorithms for bilevel optimization is more challenging than standard optimization because the lower-level problem defines the feasibility set implicitly via another optimization problem. One tractable case is when the lower-level problem permits strong convexity. Recent works show that second-order methods can provably converge to an $\epsilon$-first-order stationary point of the problem at a rate of $\tilde{\mathcal{O}}(\epsilon^{-2})$, yet these algorithms require a Hessian-vector product oracle. Kwon et al. (2023) resolved the problem by proposing a first-order method that can achieve the same goal at a slower rate of $\tilde{\mathcal{O}}(\epsilon^{-3})$. In this work, we provide an improved analysis demonstrating that the first-order method can also find an $\epsilon$-first-order stationary point within $\tilde {\mathcal{O}}(\epsilon^{-2})$ oracle complexity, which matches the upper bounds for second-order methods in the dependency on $\epsilon$. Our analysis further leads to simple first-order algorithms that can achieve similar near-optimal rates in finding second-order stationary points and in distributed bilevel problems.
翻译:摘要:双层优化在超参数优化和元学习等领域具有广泛应用。设计理论上高效的双层优化算法比标准优化更具挑战性,因为下层问题通过另一个优化问题隐式定义了可行性集。一个可处理的情况是当下层问题具有强凸性。近期研究表明,二阶方法能以$\tilde{\mathcal{O}}(\epsilon^{-2})$的速率可证明地收敛到问题的$\epsilon$-一阶驻点,但这些算法需要Hessian-向量乘积预言。Kwon等人(2023)通过提出一阶方法解决了该问题,但该方法的收敛速率为较慢的$\tilde{\mathcal{O}}(\epsilon^{-3})$。本文中,我们提出改进的分析方法,证明一阶方法也能在$\tilde{\mathcal{O}}(\epsilon^{-2})$的预言复杂度内找到$\epsilon$-一阶驻点,在$\epsilon$的依赖性上匹配了二阶方法的上界。我们的分析进一步催生了简单的一阶算法,这些算法在寻找二阶驻点和分布式双层问题中也能达到类似的近最优速率。