Differentiable optimization layers enable learning systems to make decisions by solving embedded optimization problems. However, computing gradients via implicit differentiation requires solving a linear system with Hessian terms, which is both compute- and memory-intensive. To address this challenge, we propose a novel algorithm that computes the gradient using only first-order information. The key insight is to rewrite the differentiable optimization as a bilevel optimization problem and leverage recent advances in bilevel methods. Specifically, we introduce an active-set Lagrangian hypergradient oracle that avoids Hessian evaluations and provides finite-time, non-asymptotic approximation guarantees. We show that an approximate hypergradient can be computed using only first-order information in $\tilde{O}(1)$ time, leading to an overall complexity of $\tilde{O}(δ^{-1}ε^{-3})$ for constrained bilevel optimization, which matches the best known rate for non-smooth non-convex optimization. Furthermore, we release an open-source Python library that can be easily adapted from existing solvers. The source code is available at https://github.com/guaguakai/FFOLayer.
翻译:可微优化层使学习系统能够通过求解嵌入的优化问题来做出决策。然而,通过隐式微分计算梯度需要求解包含海森矩阵项的线性系统,这既耗费计算资源又耗费内存。为解决这一挑战,我们提出了一种仅利用一阶信息计算梯度的新颖算法。关键见解在于将可微优化重新表述为双层优化问题,并利用双层方法的最新进展。具体而言,我们引入了一个避免海森矩阵计算的主动集拉格朗日超梯度算子,并提供有限时间、非渐近的近似保证。我们证明,仅使用一阶信息即可在$\tilde{O}(1)$时间内计算出近似超梯度,从而使得约束双层优化的总体复杂度达到$\tilde{O}(δ^{-1}ε^{-3})$,这与非光滑非凸优化的已知最佳速率相匹配。此外,我们发布了一个可轻松从现有求解器改编的开源Python库。源代码可在https://github.com/guaguakai/FFOLayer获取。