Fair Selection of Edge Nodes to Participate in Clustered Federated Multitask Learning

from arxiv, To appear in IEEE Transactions on Network and Service Management, Special issue on Federated Learning for the Management of Networked Systems

Clustered federated Multitask learning is introduced as an efficient technique when data is unbalanced and distributed amongst clients in a non-independent and identically distributed manner. While a similarity metric can provide client groups with specialized models according to their data distribution, this process can be time-consuming because the server needs to capture all data distribution first from all clients to perform the correct clustering. Due to resource and time constraints at the network edge, only a fraction of devices {is} selected every round, necessitating the need for an efficient scheduling technique to address these issues. Thus, this paper introduces a two-phased client selection and scheduling approach to improve the convergence speed while capturing all data distributions. This approach ensures correct clustering and fairness between clients by leveraging bandwidth reuse for participants spent a longer time training their models and exploiting the heterogeneity in the devices to schedule the participants according to their delay. The server then performs the clustering depending on predetermined thresholds and stopping criteria. When a specified cluster approximates a stopping point, the server employs a greedy selection for that cluster by picking the devices with lower delay and better resources. The convergence analysis is provided, showing the relationship between the proposed scheduling approach and the convergence rate of the specialized models to obtain convergence bounds under non-i.i.d. data distribution. We carry out extensive simulations, and the results demonstrate that the proposed algorithms reduce training time and improve the convergence speed while equipping every user with a customized model tailored to its data distribution.

翻译：集群联邦多任务学习被提出作为处理数据非独立同分布且分布不均客户端的高效技术。虽然相似性度量可根据客户端数据分布为其提供专门模型，但该过程可能耗时，因为服务器需首先从所有客户端捕获全部数据分布以执行正确聚类。受网络边缘资源与时间限制，每轮仅能选择部分设备参与，亟需高效调度技术解决这些问题。为此，本文提出一种两阶段客户端选择与调度方法，在捕获全部数据分布的同时提升收敛速度。该方法通过复用带宽资源，使参与训练时间更长的设备获得更长的模型训练时间，并利用设备异构性根据延迟调度参与者，从而确保正确聚类与客户端公平性。服务器随后根据预设阈值与停止准则执行聚类。当特定聚类接近停止点时，服务器对该聚类采用贪心选择策略，优先选取延迟更低、资源更优的设备。本文提供收敛性分析，阐明了所提调度方法与专门模型收敛速度之间的关系，并推导出非独立同分布数据分布下的收敛界。大量仿真结果表明，所提算法在降低训练时间、提升收敛速度的同时，为每位用户提供了符合其数据分布的定制化模型。