Fair Selection of Edge Nodes to Participate in Clustered Federated Multitask Learning

from arxiv, To appear in IEEE Transactions on Network and Service Management, Special issue on Federated Learning for the Management of Networked Systems

Clustered federated Multitask learning is introduced as an efficient technique when data is unbalanced and distributed amongst clients in a non-independent and identically distributed manner. While a similarity metric can provide client groups with specialized models according to their data distribution, this process can be time-consuming because the server needs to capture all data distribution first from all clients to perform the correct clustering. Due to resource and time constraints at the network edge, only a fraction of devices {is} selected every round, necessitating the need for an efficient scheduling technique to address these issues. Thus, this paper introduces a two-phased client selection and scheduling approach to improve the convergence speed while capturing all data distributions. This approach ensures correct clustering and fairness between clients by leveraging bandwidth reuse for participants spent a longer time training their models and exploiting the heterogeneity in the devices to schedule the participants according to their delay. The server then performs the clustering depending on predetermined thresholds and stopping criteria. When a specified cluster approximates a stopping point, the server employs a greedy selection for that cluster by picking the devices with lower delay and better resources. The convergence analysis is provided, showing the relationship between the proposed scheduling approach and the convergence rate of the specialized models to obtain convergence bounds under non-i.i.d. data distribution. We carry out extensive simulations, and the results demonstrate that the proposed algorithms reduce training time and improve the convergence speed while equipping every user with a customized model tailored to its data distribution.

翻译：集群联邦多任务学习被提出作为一种高效技术，适用于数据分布不平衡且以非独立同分布方式分布于客户端之间的场景。虽然相似度度量可根据客户端的数据分布为其提供专业化模型，但该过程可能耗时较长，因为服务器需要首先从所有客户端捕获完整的数据分布以执行正确的聚类。由于网络边缘的资源与时间约束，每轮仅能选择部分设备参与，因此需要高效的调度技术来解决这些问题。为此，本文提出一种两阶段的客户端选择与调度方法，以在捕获所有数据分布的同时提高收敛速度。该方法通过重用带宽资源使参与者能够更长时间地训练其模型，并利用设备异构性根据参与者各自的延迟进行调度，从而确保正确聚类及客户端之间的公平性。随后，服务器基于预设阈值与终止条件执行聚类。当某个指定聚类接近终止点时，服务器对该聚类采用贪心选择策略，优先选取延迟较低且资源更优的设备。本文提供了收敛性分析，揭示了所提调度方法与专业化模型收敛速率之间的关系，从而在非独立同分布数据条件下获得收敛边界。我们进行了大量仿真实验，结果表明所提算法能够减少训练时间并提高收敛速度，同时为每个用户提供与其数据分布相匹配的定制化模型。