Bilevel optimization recently has received tremendous attention due to its great success in solving important machine learning problems like meta learning, reinforcement learning, and hyperparameter optimization. Extending single-agent training on bilevel problems to the decentralized setting is a natural generalization, and there has been a flurry of work studying decentralized bilevel optimization algorithms. However, it remains unknown how to design the distributed algorithm with sample complexity and convergence rate comparable to SGD for stochastic optimization, and at the same time without directly computing the exact Hessian or Jacobian matrices. In this paper we propose such an algorithm. More specifically, we propose a novel decentralized stochastic bilevel optimization (DSBO) algorithm that only requires first order stochastic oracle, Hessian-vector product and Jacobian-vector product oracle. The sample complexity of our algorithm matches the currently best known results for DSBO, and the advantage of our algorithm is that it does not require estimating the full Hessian and Jacobian matrices, thereby having improved per-iteration complexity.
翻译:双层优化近年来因其在解决元学习、强化学习和超参数优化等重要机器学习问题上的巨大成功而受到广泛关注。将单智能体训练扩展至去中心化场景是一种自然的泛化思路,目前已有大量研究关注去中心化双层优化算法。然而,如何设计一种分布式算法,使其样本复杂度和收敛速度与随机优化中的SGD相当,同时避免直接计算精确的Hessian或Jacobian矩阵,这一问题仍未得到解决。本文提出了一种此类算法。具体而言,我们提出了一种新颖的去中心化随机双层优化(DSBO)算法,该算法仅需要一阶随机预言机、Hessian-向量积和Jacobian-向量积预言机。该算法的样本复杂度与当前DSBO领域已知的最佳结果相匹配,其优势在于无需估计完整的Hessian和Jacobian矩阵,从而改进了每轮迭代的复杂度。