Bilevel optimization reveals the inner structure of otherwise oblique optimization problems, such as hyperparameter tuning, neural architecture search, and meta-learning. A common goal in bilevel optimization is to minimize a hyper-objective that implicitly depends on the solution set of the lower-level function. Although this hyper-objective approach is widely used, its theoretical properties have not been thoroughly investigated in cases where \textit{the lower-level functions lack strong convexity}. In this work, we first provide hardness results to show that the goal of finding stationary points of the hyper-objective for nonconvex-convex bilevel optimization can be intractable for zero-respecting algorithms. Then we study a class of tractable nonconvex-nonconvex bilevel problems when the lower-level function satisfies the Polyak-{\L}ojasiewicz (PL) condition. We show a simple first-order algorithm can achieve better complexity bounds of $\tilde{\mathcal{O}}(\epsilon^{-2})$, $\tilde{\mathcal{O}}(\epsilon^{-4})$ and $\tilde{\mathcal{O}}(\epsilon^{-6})$ in the deterministic, partially stochastic, and fully stochastic setting respectively. The complexities in the first two cases are optimal up to logarithmic factors.
翻译:双层优化揭示了诸如超参数调优、神经架构搜索和元学习等间接优化问题的内在结构。双层优化的一个常见目标是最小化一个隐式依赖于下层函数解集的超目标函数。尽管这种超目标方法被广泛使用,但在下层函数缺乏强凸性的情况下,其理论性质尚未得到深入研究。在本文中,我们首先给出难度结果,表明对于非凸-凸双层优化,寻找超目标函数驻点的目标对于零尊重算法可能是难以处理的。然后,我们研究了一类可处理的非凸-非凸双层问题,其中下层函数满足Polyak-Łojasiewicz(PL)条件。我们证明了一个简单的一阶算法在确定性、部分随机性和完全随机性设置下,可以分别达到$\tilde{\mathcal{O}}(\epsilon^{-2})$、$\tilde{\mathcal{O}}(\epsilon^{-4})$和$\tilde{\mathcal{O}}(\epsilon^{-6})$的更好复杂度界。前两种情形下的复杂度在忽略对数因子时是最优的。