Learning from limited data has been extensively studied in machine learning, considering that deep neural networks achieve optimal performance when trained using a large amount of samples. Although various strategies have been proposed for centralized training, the topic of federated learning with small datasets remains largely unexplored. Moreover, in realistic scenarios, such as settings where medical institutions are involved, the number of participating clients is also constrained. In this work, we propose a novel federated learning framework, named RepTreeFL. At the core of the solution is the concept of a replica, where we replicate each participating client by copying its model architecture and perturbing its local data distribution. Our approach enables learning from limited data and a small number of clients by aggregating a larger number of models with diverse data distributions. Furthermore, we leverage the hierarchical structure of the client network (both original and virtual), alongside the model diversity across replicas, and introduce a diversity-based tree aggregation, where replicas are combined in a tree-like manner and the aggregation weights are dynamically updated based on the model discrepancy. We evaluated our method on two tasks and two types of data, graph generation and image classification (binary and multi-class), with both homogeneous and heterogeneous model architectures. Experimental results demonstrate the effectiveness and outperformance of RepTreeFL in settings where both data and clients are limited. Our code is available at https://github.com/basiralab/RepTreeFL.
翻译:从有限数据中学习已在机器学习领域得到广泛研究,因为深度神经网络在大规模样本训练时才能达到最优性能。尽管已有多种面向集中式训练的策略被提出,但针对小数据集的联邦学习课题仍鲜有探索。更关键的是,在医疗机构的实际场景中,参与客户端数量同样受到限制。本文提出一种名为RepTreeFL的新型联邦学习框架。该方案的核心是副本概念,即通过复制每个参与客户端的模型架构并扰动其本地数据分布来生成副本。我们的方法通过聚合更多具有多样化数据分布的模型,实现了从有限数据和少量客户端中学习。此外,我们利用客户端网络(包括原始节点与虚拟节点)的层次结构,结合跨副本的模型多样性,引入了一种基于多样性的树状聚合方法:以树形方式组合副本,并根据模型差异度动态更新聚合权重。我们在图生成和图像分类(二分类与多分类)两类任务、两种数据类型上,分别采用同构和异构模型架构进行了评估。实验结果表明,在数据和客户端均受限制的场景下,RepTreeFL具有有效性和优越性能。我们的代码已开源至https://github.com/basiralab/RepTreeFL。