Bilevel optimization plays an essential role in many machine learning tasks, ranging from hyperparameter optimization to meta-learning. Existing studies on bilevel optimization, however, focus on either centralized or synchronous distributed setting. The centralized bilevel optimization approaches require collecting massive amount of data to a single server, which inevitably incur significant communication expenses and may give rise to data privacy risks. Synchronous distributed bilevel optimization algorithms, on the other hand, often face the straggler problem and will immediately stop working if a few workers fail to respond. As a remedy, we propose Asynchronous Distributed Bilevel Optimization (ADBO) algorithm. The proposed ADBO can tackle bilevel optimization problems with both nonconvex upper-level and lower-level objective functions, and its convergence is theoretically guaranteed. Furthermore, it is revealed through theoretic analysis that the iteration complexity of ADBO to obtain the $\epsilon$-stationary point is upper bounded by $\mathcal{O}(\frac{1}{{{\epsilon ^2}}})$. Thorough empirical studies on public datasets have been conducted to elucidate the effectiveness and efficiency of the proposed ADBO.
翻译:双层优化在机器学习诸多任务中扮演着至关重要的角色,涵盖从超参数优化到元学习等广泛领域。然而,现有关于双层优化的研究主要聚焦于集中式或同步分布式设置。集中式双层优化方法需要将海量数据汇集到单一服务器上,不可避免地导致显著的通信开销,并可能引发数据隐私风险。另一方面,同步分布式双层优化算法常面临掉队者问题,一旦少量工作节点无法响应,算法便会立即停止运行。为缓解上述问题,我们提出了异步分布式双层优化(ADBO)算法。所提出的ADBO能够处理上层和下层目标函数均为非凸的双层优化问题,且其收敛性在理论上得到保证。进一步地,通过理论分析揭示,ADBO达到$\epsilon$一阶稳定点的迭代复杂度上界为$\mathcal{O}(\frac{1}{{{\epsilon ^2}}})$。我们在公开数据集上开展了详尽的实证研究,以阐明所提出的ADBO算法的有效性和高效性。