Gradient methods have become mainstream techniques for Bi-Level Optimization (BLO) in learning fields. The validity of existing works heavily rely on either a restrictive Lower- Level Strong Convexity (LLSC) condition or on solving a series of approximation subproblems with high accuracy or both. In this work, by averaging the upper and lower level objectives, we propose a single loop Bi-level Averaged Method of Multipliers (sl-BAMM) for BLO that is simple yet efficient for large-scale BLO and gets rid of the limited LLSC restriction. We further provide non-asymptotic convergence analysis of sl-BAMM towards KKT stationary points, and the comparative advantage of our analysis lies in the absence of strong gradient boundedness assumption, which is always required by others. Thus our theory safely captures a wider variety of applications in deep learning, especially where the upper-level objective is quadratic w.r.t. the lower-level variable. Experimental results demonstrate the superiority of our method.
翻译:梯度方法已成为学习领域中双层优化的主流技术。现有工作的有效性严重依赖于严格的下层强凸性条件,或需高精度求解一系列近似子问题,或两者兼备。本文通过平均上下层目标函数,提出一种单循环双层平均乘子法用于双层优化,该方法简洁高效且适用于大规模双层优化,同时摆脱了有限下层强凸性条件的限制。我们进一步提供了sl-BAMM收敛至KKT稳定点的非渐近收敛性分析,其分析优势在于无需其他方法必须的强梯度有界性假设。因此,我们的理论能够可靠地覆盖深度学习中更广泛的应用场景,特别是当上层目标函数关于下层变量呈二次形式时。实验结果验证了该方法优越性。