On Momentum-Based Gradient Methods for Bilevel Optimization with Nonconvex Lower-Level

Bilevel optimization is a popular two-level hierarchical optimization, which has been widely applied to many machine learning tasks such as hyperparameter learning, meta learning and continual learning. Although many bilevel optimization methods recently have been developed, the bilevel methods are not well studied when the lower-level problem is nonconvex. To fill this gap, in the paper, we study a class of nonconvex bilevel optimization problems, where both upper-level and lower-level problems are nonconvex, and the lower-level problem satisfies Polyak-{\L}ojasiewicz (PL) condition. We propose an efficient momentum-based gradient bilevel method (MGBiO) to solve these deterministic problems. Meanwhile, we propose a class of efficient momentum-based stochastic gradient bilevel methods (MSGBiO and VR-MSGBiO) to solve these stochastic problems. Moreover, we provide a useful convergence analysis framework for our methods. Specifically, under some mild conditions, we prove that our MGBiO method has a sample (or gradient) complexity of $O(\epsilon^{-2})$ for finding an $\epsilon$-stationary solution of the deterministic bilevel problems (i.e., $\|\nabla F(x)\|\leq \epsilon$), which improves the existing best results by a factor of $O(\epsilon^{-1})$. Meanwhile, we prove that our MSGBiO and VR-MSGBiO methods have sample complexities of $\tilde{O}(\epsilon^{-4})$ and $\tilde{O}(\epsilon^{-3})$, respectively, in finding an $\epsilon$-stationary solution of the stochastic bilevel problems (i.e., $\mathbb{E}\|\nabla F(x)\|\leq \epsilon$), which improves the existing best results by a factor of $\tilde{O}(\epsilon^{-3})$. Extensive experimental results on bilevel PL game and hyper-representation learning demonstrate the efficiency of our algorithms. This paper commemorates the mathematician Boris Polyak (1935 -2023).

翻译：双层优化是一种流行的两层层次优化方法，已被广泛应用于许多机器学习任务，如超参数学习、元学习和持续学习。尽管近年来已开发出许多双层优化方法，但当下层问题非凸时，双层方法尚未得到充分研究。为填补这一空白，本文研究了一类非凸双层优化问题，其中上层和下层问题均为非凸，且下层问题满足Polyak-{\L}ojasiewicz (PL)条件。我们提出了一种高效的基于动量的梯度双层方法（MGBiO）来解决这些确定性优化问题。同时，我们提出了一类高效的基于动量的随机梯度双层方法（MSGBiO和VR-MSGBiO）来解决这些随机优化问题。此外，我们为所提方法提供了一个有用的收敛性分析框架。具体而言，在温和条件下，我们证明了MGBiO方法在寻找确定性双层问题的$\epsilon$-驻点解（即$\|\nabla F(x)\|\leq \epsilon$）时，样本（或梯度）复杂度为$O(\epsilon^{-2})$，这比现有最优结果改进了$O(\epsilon^{-1})$倍。同时，我们证明了MSGBiO和VR-MSGBiO方法在寻找随机双层问题的$\epsilon$-驻点解（即$\mathbb{E}\|\nabla F(x)\|\leq \epsilon$）时，样本复杂度分别为$\tilde{O}(\epsilon^{-4})$和$\tilde{O}(\epsilon^{-3})$，这比现有最优结果改进了$\tilde{O}(\epsilon^{-3})$倍。在双层PL博弈和超表示学习上的大量实验结果表明了我们算法的效率。本文纪念数学家鲍里斯·波利亚克（1935-2023）。