Provable Acceleration of Distributed Optimization with Local Updates

In conventional distributed optimization, each agent performs a single local update between two communication rounds with its neighbors to synchronize solutions. Inspired by the success of using multiple local updates in federated learning, incorporating local updates into distributed optimization has recently attracted increasing attention. However, unlike federated learning, where multiple local updates can accelerate learning by improving gradient estimation under mini-batch settings, it remains unclear whether similar benefits hold in distributed optimization when gradients are exact. Moreover, existing theoretical results typically require reducing the step size when multiple local updates are employed, which can entirely offset any potential benefit of these additional local updates and obscure their true impact on convergence. In this paper, we focus on the classic DIGing algorithm and leverage the tight performance bounds provided by Performance Estimation Problems (PEP) to show that incorporating local updates can indeed accelerate distributed optimization. To the best of our knowledge, this is the first rigorous demonstration of such acceleration for a broad class of objective functions. Our analysis further reveals that, under an appropriate step size, performing only two local updates is sufficient to achieve the maximal possible improvement, and that additional local updates provide no further gains. Because more updates increase computational cost, these findings offer practical guidance for efficient implementation. Extensive experiments on both synthetic and real-world datasets corroborate the theoretical findings.

翻译：在传统的分布式优化中，每个智能体在两次与其邻居的通信轮次之间执行一次局部更新以同步解。受联邦学习中采用多次局部更新取得成功的启发，将局部更新纳入分布式优化近来日益受到关注。然而，与联邦学习不同，在联邦学习中，多次局部更新可以通过在小批量设置下改进梯度估计来加速学习，但在梯度精确的情况下，类似的益处是否存在于分布式优化中仍不明确。此外，现有的理论结果通常要求在采用多次局部更新时减小步长，这可能会完全抵消这些额外局部更新的任何潜在益处，并模糊其对收敛的真实影响。在本文中，我们聚焦于经典的DIGing算法，并利用性能估计问题（PEP）提供的紧致性能界，证明引入局部更新确实可以加速分布式优化。据我们所知，这是针对广泛一类目标函数首次严格证明了此类加速。我们的分析进一步揭示，在适当的步长下，仅执行两次局部更新就足以实现最大可能的改进，而额外的局部更新不会带来进一步的增益。由于更多更新会增加计算成本，这些发现为高效实现提供了实用指导。在合成和真实世界数据集上的大量实验证实了理论发现。