Decentralized learning over distributed datasets can have significantly different data distributions across the agents. The current state-of-the-art decentralized algorithms mostly assume the data distributions to be Independent and Identically Distributed. This paper focuses on improving decentralized learning over non-IID data. We propose \textit{Neighborhood Gradient Clustering (NGC)}, a novel decentralized learning algorithm that modifies the local gradients of each agent using self- and cross-gradient information. Cross-gradients for a pair of neighboring agents are the derivatives of the model parameters of an agent with respect to the dataset of the other agent. In particular, the proposed method replaces the local gradients of the model with the weighted mean of the self-gradients, model-variant cross-gradients (derivatives of the neighbors' parameters with respect to the local dataset), and data-variant cross-gradients (derivatives of the local model with respect to its neighbors' datasets). The data-variant cross-gradients are aggregated through an additional communication round without breaking the privacy constraints. Further, we present \textit{CompNGC}, a compressed version of \textit{NGC} that reduces the communication overhead by $32 \times$. We theoretically analyze the convergence rate of the proposed algorithm and demonstrate its efficiency over non-IID data sampled from {various vision and language} datasets trained. Our experiments demonstrate that \textit{NGC} and \textit{CompNGC} outperform (by $0-6\%$) the existing SoTA decentralized learning algorithm over non-IID data with significantly less compute and memory requirements. Further, our experiments show that the model-variant cross-gradient information available locally at each agent can improve the performance over non-IID data by $1-35\%$ without additional communication cost.
翻译:去中心化学习在分布式数据集上运行时,各智能体之间的数据分布可能存在显著差异。当前最先进的去中心化算法大多假设数据分布为独立同分布。本文致力于改进非独立同分布数据上的去中心化学习。我们提出**邻域梯度聚类 (NGC)**——一种新颖的去中心化学习算法,该算法利用自梯度和交叉梯度信息对各智能体的局部梯度进行修正。一对相邻智能体的交叉梯度是指一个智能体模型参数关于另一个智能体数据集的导数。具体而言,所提方法将模型的局部梯度替换为自梯度、模型变体交叉梯度(邻居参数关于本地数据集的导数)和数据变体交叉梯度(本地模型关于邻居数据集的导数)的加权均值。数据变体交叉梯度通过额外一轮通信进行聚合,且不会破坏隐私约束。此外,我们还提出了**CompNGC**——**NGC**的压缩版本,可将通信开销降低32倍。我们从理论上分析了所提算法的收敛速率,并证明了其在从多种视觉和语言数据集中采样的非独立同分布数据上的高效性。实验表明,**NGC**和**CompNGC**在非独立同分布数据上相比现有最先进的去中心化算法性能提升了0%-6%,同时计算和内存需求显著降低。此外,我们的实验证明,每个智能体局部可用的模型变体交叉梯度信息可在不增加通信成本的情况下,将非独立同分布数据上的性能提升1%-35%。