Emerging distributed applications recently boosted the development of decentralized machine learning, especially in IoT and edge computing fields. In real-world scenarios, the common problems of non-convexity and data heterogeneity result in inefficiency, performance degradation, and development stagnation. The bulk of studies concentrates on one of the issues mentioned above without having a more general framework that has been proven optimal. To this end, we propose a unified paradigm called UMP, which comprises two algorithms, D-SUM and GT-DSUM, based on the momentum technique with decentralized stochastic gradient descent(SGD). The former provides a convergence guarantee for general non-convex objectives. At the same time, the latter is extended by introducing gradient tracking, which estimates the global optimization direction to mitigate data heterogeneity(i.e., distribution drift). We can cover most momentum-based variants based on the classical heavy ball or Nesterov's acceleration with different parameters in UMP. In theory, we rigorously provide the convergence analysis of these two approaches for non-convex objectives and conduct extensive experiments, demonstrating a significant improvement in model accuracy by up to 57.6% compared to other methods in practice.
翻译:新兴的分布式应用近期推动了分散式机器学习的发展,尤其在物联网与边缘计算领域。真实场景中普遍存在的非凸性与数据异构问题,导致算法效率低下、性能退化及发展停滞。现有研究大多聚焦于上述某一方面问题,缺乏经理论证明最优的通用框架。为此,我们提出名为UMP的统一范式,该范式包含基于动量技术的分散式随机梯度下降(SGD)两种算法:D-SUM与GT-DSUM。前者为非凸目标函数提供收敛性保障,后者通过引入梯度追踪扩展,估计全局优化方向以缓解数据异构性(即分布漂移)。在UMP中,通过不同参数设置可涵盖基于经典重球法或涅斯捷罗夫加速法的绝大多数动量变体。理论方面,我们严谨地证明了两种方法在非凸目标下的收敛性,并通过大量实验表明,相较于其他方法,模型准确率在实践中最高可提升57.6%。