Federated learning (FL) is a machine learning paradigm where multiple clients collaborate to optimize a single global model using their private data. The global model is maintained by a central server that orchestrates the FL training process through a series of training rounds. In each round, the server samples clients from a client pool before sending them its latest global model parameters for further optimization. Naive sampling strategies implement random client sampling and fail to factor client data distributions for privacy reasons. Hence we proposes an alternative sampling strategy by performing a one-time clustering of clients based on their model's learned high-level features while respecting data privacy. This enables the server to perform stratified client sampling across clusters in every round. We show datasets of sampled clients selected with this approach yield a low relative entropy with respect to the global data distribution. Consequently, the FL training becomes less noisy and significantly improves the convergence of the global model by as much as 7.4% in some experiments. Furthermore, it also significantly reduces the communication rounds required to achieve a target accuracy.
翻译:联邦学习(FL)是一种机器学习范式,多个客户端利用其私有数据协作优化单个全局模型。该全局模型由中央服务器维护,并通过一系列训练轮次协调FL训练过程。在每一轮中,服务器从客户端池中采样客户端,然后将最新的全局模型参数发送给它们以进行进一步优化。朴素采样策略采用随机客户端采样,且出于隐私原因未能考虑客户端数据分布。为此,我们提出一种替代采样策略,该方法在尊重数据隐私的前提下,基于客户端模型学习到的高层特征对客户端执行一次性聚类。这使得服务器能够在每轮中对各聚类进行分层客户端采样。实验表明,采用该方法采样的客户端数据集相对于全局数据分布具有较低的相对熵。因此,FL训练噪声降低,并在某些实验中使全局模型收敛速度提升高达7.4%。此外,该方法还显著减少了达到目标精度所需的通信轮次。