Decentralized federated learning (DFL) enables collaborative model training without a central server, but converges slowly under statistical heterogeneity. Recent work has shown that neural tangent kernel (NTK) methods achieve faster convergence than gradient-based updates in DFL, while momentum has proven effective for accelerating gradient-based FL. However, applying momentum to NTK updates can destabilize training under heterogeneous data. We propose SPARK, which addresses this instability with a stage-wise annealed soft-label regularizer evaluated on neighborhood-aggregated data, so that momentum can accelerate NTK updates stably. Under high heterogeneity, SPARK converges about 3$\times$ faster than baselines and lowers the total communication to a target accuracy by up to about 70\%, and it attains higher accuracy across heterogeneity levels. We further study random projection as an optional Jacobian-compression strategy for bandwidth-constrained settings. We validate the approach across multiple datasets, network topologies, and heterogeneity levels.
翻译:去中心化联邦学习(DFL)无需中央服务器即可实现协作模型训练,但在统计异质性下收敛缓慢。近期研究表明,神经切核(NTK)方法在DFL中比基于梯度的更新收敛更快,而动量已被证明能有效加速基于梯度的联邦学习。然而,在异质性数据下将动量应用于NTK更新可能破坏训练稳定性。我们提出SPARK方法,通过在邻域聚合数据上评估的阶段式退火软标签正则化器解决此不稳定性问题,从而使得动量能稳定加速NTK更新。在高异质性条件下,SPARK收敛速度约为基线方法的3倍,将达到目标精度的总通信量降低高达约70%,并在不同异质性水平上均取得更高精度。我们进一步研究随机投影作为带宽受限场景下的可选雅可比压缩策略。我们在多个数据集、网络拓扑结构和异质性水平上验证了该方法。