Decentralized learning recently has received increasing attention in machine learning due to its advantages in implementation simplicity and system robustness, data privacy. Meanwhile, the adaptive gradient methods show superior performances in many machine learning tasks such as training neural networks. Although some works focus on studying decentralized optimization algorithms with adaptive learning rates, these adaptive decentralized algorithms still suffer from high sample complexity. To fill these gaps, we propose a class of faster adaptive decentralized algorithms (i.e., AdaMDOS and AdaMDOF) for distributed nonconvex stochastic and finite-sum optimization, respectively. Moreover, we provide a solid convergence analysis framework for our methods. In particular, we prove that our AdaMDOS obtains a near-optimal sample complexity of $\tilde{O}(\epsilon^{-3})$ for finding an $\epsilon$-stationary solution of nonconvex stochastic optimization. Meanwhile, our AdaMDOF obtains a near-optimal sample complexity of $O(\sqrt{n}\epsilon^{-2})$ for finding an $\epsilon$-stationary solution of nonconvex finite-sum optimization, where $n$ denotes the sample size. To the best of our knowledge, our AdaMDOF algorithm is the first adaptive decentralized algorithm for nonconvex finite-sum optimization. Some experimental results demonstrate efficiency of our algorithms.
翻译:去中心化学习因其在实现简易性、系统鲁棒性和数据隐私方面的优势,近来在机器学习领域受到越来越多的关注。与此同时,自适应梯度方法在训练神经网络等许多机器学习任务中展现出卓越的性能。尽管已有一些研究工作聚焦于具有自适应学习率的去中心化优化算法,但这些自适应去中心化算法仍受限于较高的样本复杂度。为填补这些空白,我们分别针对分布式非凸随机优化和有限和优化,提出了一类更快的自适应去中心化算法(即AdaMDOS与AdaMDOF)。此外,我们为所提方法提供了一个坚实的收敛性分析框架。具体而言,我们证明了AdaMDOS算法在寻找非凸随机优化的$\epsilon$-平稳解时,获得了近乎最优的$\tilde{O}(\epsilon^{-3})$样本复杂度。同时,我们的AdaMDOF算法在寻找非凸有限和优化的$\epsilon$-平稳解时,获得了近乎最优的$O(\sqrt{n}\epsilon^{-2})$样本复杂度,其中$n$表示样本量。据我们所知,AdaMDOF算法是首个用于非凸有限和优化的自适应去中心化算法。部分实验结果验证了我们算法的效率。