Federated learning (FL) aims to collaboratively train a global model while ensuring client data privacy. However, FL faces challenges from the non-IID data distribution among clients. Clustered FL (CFL) has emerged as a promising solution, but most existing CFL frameworks adopt synchronous frameworks lacking asynchrony. An asynchronous CFL framework called SDAGFL based on directed acyclic graph distributed ledger techniques (DAG-DLT) was proposed, but its complete decentralization leads to high communication and storage costs. We propose DAG-ACFL, an asynchronous clustered FL framework based on directed acyclic graph distributed ledger techniques (DAG-DLT). We first detail the components of DAG-ACFL. A tip selection algorithm based on the cosine similarity of model parameters is then designed to aggregate models from clients with similar distributions. An adaptive tip selection algorithm leveraging change-point detection dynamically determines the number of selected tips. We evaluate the clustering and training performance of DAG-ACFL on multiple datasets and analyze its communication and storage costs. Experiments show the superiority of DAG-ACFL in asynchronous clustered FL. By combining DAG-DLT with clustered FL, DAG-ACFL realizes robust, decentralized and private model training with efficient performance.
翻译:联邦学习旨在协同训练全局模型的同时保障客户端数据隐私。然而,联邦学习面临客户端间非独立同分布数据分布的挑战。聚类联邦学习(CFL)已成为一种有前景的解决方案,但现有大多CFL框架采用缺乏异步性的同步框架。基于有向无环图分布式账本技术(DAG-DLT)的异步CFL框架SDAGFL虽已被提出,但其完全去中心化特性导致通信和存储成本高昂。我们提出DAG-ACFL,一种基于有向无环图分布式账本技术(DAG-DLT)的异步聚类联邦学习框架。首先详细阐述DAG-ACFL的组成模块,随后设计基于模型参数余弦相似度的尖端选择算法,用于聚合分布相似的客户端模型。利用变化点检测的自适应尖端选择算法可动态确定所选尖端数量。我们在多个数据集上评估DAG-ACFL的聚类与训练性能,并分析其通信与存储成本。实验表明DAG-ACFL在异步聚类联邦学习中具有优越性。通过将DAG-DLT与聚类联邦学习相结合,DAG-ACFL实现了高效、鲁棒、去中心化且保护隐私的模型训练。