Decentralized bilevel optimization has garnered significant attention due to its critical role in solving large-scale machine learning problems. However, existing methods often rely on prior knowledge of problem parameters-such as smoothness, convexity, or communication network topologies-to determine appropriate stepsizes. In practice, these problem parameters are typically unavailable, leading to substantial manual effort for hyperparameter tuning. In this paper, we propose AdaSDBO, a fully problem-parameter-free algorithm for decentralized bilevel optimization with a single-loop structure. AdaSDBO leverages adaptive stepsizes based on cumulative gradient norms to update all variables simultaneously, dynamically adjusting its progress and eliminating the need for problem-specific hyperparameter tuning. Through rigorous theoretical analysis, we establish that AdaSDBO achieves a convergence rate of $\widetilde{\mathcal{O}}\left(\frac{1}{T}\right)$, matching the performance of well-tuned state-of-the-art methods up to polylogarithmic factors. Extensive numerical experiments demonstrate that AdaSDBO delivers competitive performance compared to existing decentralized bilevel optimization methods while exhibiting remarkable robustness across diverse stepsize configurations.
翻译:去中心化双层优化因其在解决大规模机器学习问题中的关键作用而受到广泛关注。然而,现有方法通常依赖于问题参数的先验知识——如光滑性、凸性或通信网络拓扑——来确定合适的步长。在实践中,这些参数往往难以获取,导致超参数调优需要大量人工干预。本文提出AdaSDBO,一种具有单循环结构的完全免问题参数的去中心化双层优化算法。AdaSDBO利用基于累积梯度范数的自适应步长同时更新所有变量,动态调整优化进程,从而无需针对具体问题进行超参数调优。通过严格的理论分析,我们证明AdaSDBO实现了$\widetilde{\mathcal{O}}\left(\frac{1}{T}\right)$的收敛速率,在多项式对数因子范围内与经过精细调优的先进方法性能相当。大量数值实验表明,与现有去中心化双层优化方法相比,AdaSDBO在保持竞争力的同时,在不同步长配置下展现出卓越的鲁棒性。