Stochastic bilevel optimization (SBO) is becoming increasingly essential in machine learning due to its versatility in handling nested structures. To address large-scale SBO, decentralized approaches have emerged as effective paradigms in which nodes communicate with immediate neighbors without a central server, thereby improving communication efficiency and enhancing algorithmic robustness. However, most decentralized SBO algorithms focus solely on asymptotic convergence rates, overlooking transient iteration complexity-the number of iterations required before asymptotic rates dominate, which results in limited understanding of the influence of network topology, data heterogeneity, and the nested bilevel algorithmic structures. To address this issue, this paper introduces D-SOBA, a Decentralized Stochastic One-loop Bilevel Algorithm framework. D-SOBA comprises two variants: D-SOBA-SO, which incorporates second-order Hessian and Jacobian matrices, and D-SOBA-FO, which relies entirely on first-order gradients. We provide a comprehensive non-asymptotic convergence analysis and establish the transient iteration complexity of D-SOBA. This provides the first theoretical understanding of how network topology, data heterogeneity, and nested bilevel structures influence decentralized SBO. Extensive experimental results demonstrate the efficiency and theoretical advantages of D-SOBA.
翻译:随机双层优化因其在处理嵌套结构方面的通用性,在机器学习中正变得越来越重要。为应对大规模随机双层优化问题,去中心化方法已成为有效的范式,其中节点仅与直接邻居通信而无需中央服务器,从而提高了通信效率并增强了算法的鲁棒性。然而,大多数去中心化随机双层优化算法仅关注渐近收敛速率,忽视了瞬态迭代复杂度——即渐近速率占主导地位之前所需的迭代次数,这导致对网络拓扑、数据异构性以及嵌套双层算法结构的影响理解有限。为解决这一问题,本文提出了D-SOBA,一个去中心化随机单循环双层算法框架。D-SOBA包含两个变体:D-SOBA-SO(融合了二阶海森矩阵与雅可比矩阵)和D-SOBA-FO(完全依赖一阶梯度)。我们提供了全面的非渐近收敛性分析,并建立了D-SOBA的瞬态迭代复杂度。这首次从理论上阐释了网络拓扑、数据异构性和嵌套双层结构如何影响去中心化随机双层优化。大量实验结果验证了D-SOBA的效能与理论优势。