The ProxSkip algorithm for decentralized and federated learning is gaining increasing attention due to its proven benefits in accelerating communication complexity while maintaining robustness against data heterogeneity. However, existing analyses of ProxSkip are limited to the strongly convex setting and do not achieve linear speedup, where convergence performance increases linearly with respect to the number of nodes. So far, questions remain open about how ProxSkip behaves in the non-convex setting and whether linear speedup is achievable. In this paper, we revisit decentralized ProxSkip and address both questions. We demonstrate that the leading communication complexity of ProxSkip is $\mathcal{O}\left(\frac{p\sigma^2}{n\epsilon^2}\right)$ for non-convex and convex settings, and $\mathcal{O}\left(\frac{p\sigma^2}{n\epsilon}\right)$ for the strongly convex setting, where $n$ represents the number of nodes, $p$ denotes the probability of communication, $\sigma^2$ signifies the level of stochastic noise, and $\epsilon$ denotes the desired accuracy level. This result illustrates that ProxSkip achieves linear speedup and can asymptotically reduce communication overhead proportional to the probability of communication. Additionally, for the strongly convex setting, we further prove that ProxSkip can achieve linear speedup with network-independent stepsizes.
翻译:ProxSkip算法在去中心化和联邦学习领域正受到越来越多的关注,因为它在保持对数据异构性鲁棒性的同时,能有效加速通信复杂度。然而,现有的ProxSkip分析仅限于强凸设置,且无法实现线性加速(即收敛性能随节点数量线性提升)。目前,关于ProxSkip在非凸设置中的表现以及线性加速是否可实现的问题仍未解决。本文重新审视去中心化ProxSkip,并对上述两个问题进行了研究。我们证明,在非凸和凸设置下,ProxSkip的主导通信复杂度为$\mathcal{O}\left(\frac{p\sigma^2}{n\epsilon^2}\right)$;在强凸设置下为$\mathcal{O}\left(\frac{p\sigma^2}{n\epsilon}\right)$,其中$n$表示节点数量,$p$表示通信概率,$\sigma^2$表示随机噪声水平,$\epsilon$表示所需精度。该结果表明,ProxSkip实现了线性加速,并能渐进地将通信开销降低至与通信概率成比例。此外,针对强凸设置,我们进一步证明ProxSkip可在与网络无关的步长下实现线性加速。