This work focuses on addressing two major challenges in the context of large-scale nonconvex Bi-Level Optimization (BLO) problems, which are increasingly applied in machine learning due to their ability to model nested structures. These challenges involve ensuring computational efficiency and providing theoretical guarantees. While recent advances in scalable BLO algorithms have primarily relied on lower-level convexity simplification, our work specifically tackles large-scale BLO problems involving nonconvexity in both the upper and lower levels. We simultaneously address computational and theoretical challenges by introducing an innovative single-loop gradient-based algorithm, utilizing the Moreau envelope-based reformulation, and providing non-asymptotic convergence analysis for general nonconvex BLO problems. Notably, our algorithm relies solely on first-order gradient information, enhancing its practicality and efficiency, especially for large-scale BLO learning tasks. We validate our approach's effectiveness through experiments on various synthetic problems, two typical hyper-parameter learning tasks, and a real-world neural architecture search application, collectively demonstrating its superior performance.
翻译:本文聚焦于解决大规模非凸双层优化(BLO)问题中的两大核心挑战(此类问题因其建模嵌套结构的能力在机器学习领域应用日益广泛):确保计算效率与提供理论保证。尽管近年来可扩展BLO算法的进展主要依赖于下层凸性简化,但本研究专门处理上下层均涉及非凸性的大规模BLO问题。我们通过引入基于Moreau包络重构的创新性单循环梯度算法,并针对一般非凸BLO问题提供非渐近收敛性分析,同时解决了计算与理论层面的挑战。值得注意的是,本算法仅依赖一阶梯度信息,显著提升了实用性及效率,尤其适用于大规模BLO学习任务。我们通过多种合成问题实验、两类典型超参数学习任务以及一项实际神经架构搜索应用验证了方法的有效性,共同证明了其优越性能。