Bilevel optimization plays an essential role in many machine learning tasks, ranging from hyperparameter optimization to meta-learning. Existing studies on bilevel optimization, however, focus on either centralized or synchronous distributed setting. The centralized bilevel optimization approaches require collecting massive amount of data to a single server, which inevitably incur significant communication expenses and may give rise to data privacy risks. Synchronous distributed bilevel optimization algorithms, on the other hand, often face the straggler problem and will immediately stop working if a few workers fail to respond. As a remedy, we propose Asynchronous Distributed Bilevel Optimization (ADBO) algorithm. The proposed ADBO can tackle bilevel optimization problems with both nonconvex upper-level and lower-level objective functions, and its convergence is theoretically guaranteed. Furthermore, it is revealed through theoretic analysis that the iteration complexity of ADBO to obtain the $\epsilon$-stationary point is upper bounded by $\mathcal{O}(\frac{1}{{{\epsilon ^2}}})$. Thorough empirical studies on public datasets have been conducted to elucidate the effectiveness and efficiency of the proposed ADBO.
翻译:双层优化在诸多机器学习任务(从超参数优化到元学习)中扮演着关键角色。然而,现有关于双层优化的研究主要聚焦于集中式或同步分布式场景。集中式双层优化方法需将海量数据汇聚至单一服务器,这不仅不可避免地带来显著通信开销,还可能引发数据隐私风险。另一方面,同步分布式双层优化算法常面临“掉队者”问题,一旦少数工作节点未能响应,算法将立即失效。为此,我们提出异步分布式双层优化(ADBO)算法。所提出的ADBO能够处理具有非凸上层与下层目标函数的双层优化问题,其收敛性具有理论保证。此外,理论分析表明,ADBO达到$\epsilon$-驻点的迭代复杂度上界为$\mathcal{O}(\frac{1}{{{\epsilon ^2}}})$。基于公开数据集开展了详尽的实证研究,以阐明所提ADBO算法的有效性与高效性。