Gradient methods have become mainstream techniques for Bi-Level Optimization (BLO) in learning fields. The validity of existing works heavily rely on either a restrictive Lower-Level Strong Convexity (LLSC) condition or on solving a series of approximation subproblems with high accuracy or both. In this work, by averaging the upper and lower level objectives, we propose a single loop Bi-level Averaged Method of Multipliers (sl-BAMM) for BLO that is simple yet efficient for large-scale BLO and gets rid of the limited LLSC restriction. We further provide non-asymptotic convergence analysis of sl-BAMM towards KKT stationary points, and the comparative advantage of our analysis lies in the absence of strong gradient boundedness assumption, which is always required by others. Thus our theory safely captures a wider variety of applications in deep learning, especially where the upper-level objective is quadratic w.r.t. the lower-level variable. Experimental results demonstrate the superiority of our method.
翻译:梯度方法已成为学习领域中双层优化(BLO)的主流技术。现有工作的有效性严重依赖于严格的下层强凸性(LLSC)条件,或需高精度求解一系列近似子问题,甚至两者兼具。本文通过对上层与下层目标函数取平均,提出一种用于BLO的单循环双层平均乘子法(sl-BAMM)。该方法结构简单且适用于大规模BLO,同时摆脱了LLSC的限制。我们进一步给出sl-BAMM收敛至KKT平稳点的非渐近收敛性分析,其核心优势在于无需强梯度有界假设——这一假设是其他方法所必需的。因此,我们的理论能安全涵盖深度学习中的更广泛场景,尤其适用于上层目标关于下层变量呈二次型的情形。实验结果验证了该方法的优越性。