We present a new method that includes three key components of distributed optimization and federated learning: variance reduction of stochastic gradients, partial participation, and compressed communication. We prove that the new method has optimal oracle complexity and state-of-the-art communication complexity in the partial participation setting. Regardless of the communication compression feature, our method successfully combines variance reduction and partial participation: we get the optimal oracle complexity, never need the participation of all nodes, and do not require the bounded gradients (dissimilarity) assumption.
翻译:我们提出了一种新方法,该方法融合了分布式优化与联邦学习的三个关键要素:随机梯度的方差缩减、部分参与者参与以及压缩通信。我们证明,在部分参与者参与的设定下,该方法具有最优的oracle复杂度和领先的通信复杂度。无论是否启用通信压缩功能,我们的方法均成功地将方差缩减与部分参与者参与相结合:实现了最优的oracle复杂度,无需所有节点的参与,且不要求梯度有界(非相似性)假设。