Consensus-based decentralized stochastic gradient descent (D-SGD) is a widely adopted algorithm for decentralized training of machine learning models across networked agents. A crucial part of D-SGD is the consensus-based model averaging, which heavily relies on information exchange and fusion among the nodes. Specifically, for consensus averaging over wireless networks, communication coordination is necessary to determine when and how a node can access the channel and transmit (or receive) information to (or from) its neighbors. In this work, we propose $\texttt{BASS}$, a broadcast-based subgraph sampling method designed to accelerate the convergence of D-SGD while considering the actual communication cost per iteration. $\texttt{BASS}$ creates a set of mixing matrix candidates that represent sparser subgraphs of the base topology. In each consensus iteration, one mixing matrix is sampled, leading to a specific scheduling decision that activates multiple collision-free subsets of nodes. The sampling occurs in a probabilistic manner, and the elements of the mixing matrices, along with their sampling probabilities, are jointly optimized. Simulation results demonstrate that $\texttt{BASS}$ enables faster convergence with fewer transmission slots compared to existing link-based scheduling methods. In conclusion, the inherent broadcasting nature of wireless channels offers intrinsic advantages in accelerating the convergence of decentralized optimization and learning.
翻译:基于共识的分散式随机梯度下降(D-SGD)是一种广泛应用于网络化智能体间机器学习模型分散式训练的算法。D-SGD的关键组成部分是基于共识的模型平均,该过程高度依赖于节点间的信息交换与融合。具体而言,在无线网络上进行共识平均时,需要进行通信协调以确定节点何时及如何接入信道,并向其邻居发送(或接收)信息。在本工作中,我们提出$\texttt{BASS}$,一种基于广播的子图采样方法,旨在在考虑每次迭代实际通信成本的同时加速D-SGD的收敛。$\texttt{BASS}$创建一组混合矩阵候选,这些矩阵表示基础拓扑中更稀疏的子图。在每个共识迭代中,采样一个混合矩阵,从而产生特定的调度决策,激活多个无冲突的节点子集。该采样以概率方式进行,并且混合矩阵的元素及其采样概率被联合优化。仿真结果表明,与现有的基于链路的调度方法相比,$\texttt{BASS}$能够在更少的传输时隙内实现更快的收敛。总之,无线信道的固有广播特性在加速分散式优化与学习的收敛方面具有内在优势。