Bilevel optimization (BO) has recently gained prominence in many machine learning applications due to its ability to capture the nested structure inherent in these problems. Recently, many hypergradient methods have been proposed as effective solutions for solving large-scale problems. However, current hypergradient methods for the lower-level constrained bilevel optimization (LCBO) problems need very restrictive assumptions, namely, where optimality conditions satisfy the differentiability and invertibility conditions and lack a solid analysis of the convergence rate. What's worse, existing methods require either double-loop updates, which are sometimes less efficient. To solve this problem, in this paper, we propose a new hypergradient of LCBO leveraging the theory of nonsmooth implicit function theorem instead of using the restrive assumptions. In addition, we propose a \textit{single-loop single-timescale} algorithm based on the double-momentum method and adaptive step size method and prove it can return a $(\delta, \epsilon)$-stationary point with $\tilde{\mathcal{O}}(d_2^2\epsilon^{-4})$ iterations. Experiments on two applications demonstrate the effectiveness of our proposed method.
翻译:双层优化(BO)因其能够捕捉问题中固有的嵌套结构,近年来在许多机器学习应用中受到重视。目前,已有多种超梯度方法被提出作为解决大规模问题的有效方案。然而,现有针对下层约束双层优化(LCBO)问题的超梯度方法需要非常严格的假设条件,即最优性条件需满足可微性与可逆性,且缺乏对收敛速率的坚实分析。更甚的是,现有方法要么需要双循环更新,这有时效率较低。为解决此问题,本文利用非光滑隐函数定理理论,提出了一种新的LCBO超梯度计算方法,避免了使用严格假设。此外,我们基于双动量方法与自适应步长方法,提出了一种\textit{单循环单时间尺度}算法,并证明该算法能以$\tilde{\mathcal{O}}(d_2^2\epsilon^{-4})$次迭代返回一个$(\delta, \epsilon)$-稳定点。在两个应用上的实验验证了所提方法的有效性。