Decentralized learning over distributed datasets can have significantly different data distributions across the agents. The current state-of-the-art decentralized algorithms mostly assume the data distributions to be Independent and Identically Distributed. This paper focuses on improving decentralized learning over non-IID data. We propose \textit{Neighborhood Gradient Clustering (NGC)}, a novel decentralized learning algorithm that modifies the local gradients of each agent using self- and cross-gradient information. Cross-gradients for a pair of neighboring agents are the derivatives of the model parameters of an agent with respect to the dataset of the other agent. In particular, the proposed method replaces the local gradients of the model with the weighted mean of the self-gradients, model-variant cross-gradients (derivatives of the neighbors' parameters with respect to the local dataset), and data-variant cross-gradients (derivatives of the local model with respect to its neighbors' datasets). The data-variant cross-gradients are aggregated through an additional communication round without breaking the privacy constraints. Further, we present \textit{CompNGC}, a compressed version of \textit{NGC} that reduces the communication overhead by $32 \times$. We theoretically analyze the convergence rate of the proposed algorithm and demonstrate its efficiency over non-IID data sampled from {various vision and language} datasets trained. Our experiments demonstrate that \textit{NGC} and \textit{CompNGC} outperform (by $0-6\%$) the existing SoTA decentralized learning algorithm over non-IID data with significantly less compute and memory requirements. Further, our experiments show that the model-variant cross-gradient information available locally at each agent can improve the performance over non-IID data by $1-35\%$ without additional communication cost.
翻译:去中心化学习在分布式数据集上进行时,不同智能体间的数据分布可能存在显著差异。当前最先进的去中心化算法大多假设数据分布为独立同分布。本文聚焦于改进非独立同分布数据上的去中心化学习。我们提出邻域梯度聚类(Neighborhood Gradient Clustering, NGC),一种新颖的去中心化学习算法,通过利用自梯度和交叉梯度信息来修正每个智能体的局部梯度。一对相邻智能体间的交叉梯度是指一个智能体模型参数相对于另一个智能体数据集的导数。具体而言,所提方法将模型局部梯度替换为自梯度、模型变体交叉梯度(邻居参数相对于本地数据集的导数)和数据变体交叉梯度(本地模型相对于邻居数据集的导数)的加权平均值。数据变体交叉梯度通过额外的一轮通信聚合,且不违反隐私约束。此外,我们提出CompNGC,即NGC的压缩版本,可将通信开销降低32倍。我们从理论上分析了所提算法的收敛速率,并展示了其在采自多种视觉和语言数据集的非独立同分布数据上的高效性。实验表明,与现有最先进的非独立同分布数据去中心化学习算法相比,NGC和CompNGC在显著降低计算和内存需求的同时,性能提升0-6%。进一步实验证明,每个智能体本地可获取的模型变体交叉梯度信息无需额外通信成本即可使非独立同分布数据性能提升1-35%。