In conventional distributed learning over a network, multiple agents collaboratively build a common machine learning model. However, due to the underlying non-i.i.d. data distribution among agents, the unified learning model becomes inefficient for each agent to process its locally accessible data. To address this problem, we propose a graph-attention-based personalized training algorithm (GATTA) for distributed deep learning. The GATTA enables each agent to train its local personalized model while exploiting its correlation with neighboring nodes and utilizing their useful information for aggregation. In particular, the personalized model in each agent is composed of a global part and a node-specific part. By treating each agent as one node in a graph and the node-specific parameters as its features, the benefits of the graph attention mechanism can be inherited. Namely, instead of aggregation based on averaging, it learns the specific weights for different neighboring nodes without requiring prior knowledge about the graph structure or the neighboring nodes' data distribution. Furthermore, relying on the weight-learning procedure, we develop a communication-efficient GATTA by skipping the transmission of information with small aggregation weights. Additionally, we theoretically analyze the convergence properties of GATTA for non-convex loss functions. Numerical results validate the excellent performances of the proposed algorithms in terms of convergence and communication cost.
翻译:在传统的网络分布式学习中,多个智能体协作构建统一的机器学习模型。然而,由于各智能体间底层数据分布的非独立同分布特性,统一学习模型在处理各自本地可访问数据时效率低下。为解决这一问题,我们提出了一种基于图注意力的个性化训练算法(GATTA)用于分布式深度学习。GATTA使每个智能体能够训练其本地个性化模型,同时利用其与邻居节点的相关性,并聚合来自邻居的有用信息。具体而言,每个智能体的个性化模型由全局部分和节点特定部分组成。通过将每个智能体视为图中的一个节点,并将其节点特定参数作为其特征,图注意力机制的优势得以继承:即基于平均聚合不同,它无需预先了解图结构或邻居节点的数据分布,即可学习不同邻居节点的特定权重。此外,依托权重学习过程,我们通过跳过具有较小聚合权重的信息传输,开发了通信高效的GATTA算法。同时,我们从理论上分析了GATTA在非凸损失函数下的收敛性质。数值结果验证了所提算法在收敛性和通信成本方面的优异性能。