Dual-Criterion Model Aggregation in Federated Learning: Balancing Data Quantity and Quality

Federated learning (FL) has become one of the key methods for privacy-preserving collaborative learning, as it enables the transfer of models without requiring local data exchange. Within the FL framework, an aggregation algorithm is recognized as one of the most crucial components for ensuring the efficacy and security of the system. Existing average aggregation algorithms typically assume that all client-trained data holds equal value or that weights are based solely on the quantity of data contributed by each client. In contrast, alternative approaches involve training the model locally after aggregation to enhance adaptability. However, these approaches fundamentally ignore the inherent heterogeneity between different clients' data and the complexity of variations in data at the aggregation stage, which may lead to a suboptimal global model. To address these issues, this study proposes a novel dual-criterion weighted aggregation algorithm involving the quantity and quality of data from the client node. Specifically, we quantify the data used for training and perform multiple rounds of local model inference accuracy evaluation on a specialized dataset to assess the data quality of each client. These two factors are utilized as weights within the aggregation process, applied through a dynamically weighted summation of these two factors. This approach allows the algorithm to adaptively adjust the weights, ensuring that every client can contribute to the global model, regardless of their data's size or initial quality. Our experiments show that the proposed algorithm outperforms several existing state-of-the-art aggregation approaches on both a general-purpose open-source dataset, CIFAR-10, and a dataset specific to visual obstacle avoidance.

翻译：联邦学习（FL）已成为隐私保护协同学习的关键方法之一，因其能够在无需交换本地数据的情况下实现模型传递。在FL框架中，聚合算法被认为是确保系统效能与安全性的最关键组件之一。现有的平均聚合算法通常假设所有客户端训练数据具有同等价值，或仅依据各客户端贡献的数据量分配权重。相比之下，另一些方法通过在聚合后进行本地模型训练以增强适应性。然而，这些方法本质上忽略了不同客户端数据之间的固有异质性，以及聚合阶段数据变化的复杂性，这可能导致全局模型陷入次优状态。为解决这些问题，本研究提出了一种新颖的双准则加权聚合算法，该算法同时考量客户端节点的数据数量与质量。具体而言，我们对用于训练的数据进行量化，并通过在专用数据集上进行多轮本地模型推理精度评估来衡量各客户端的数据质量。这两个因素作为聚合过程中的权重，通过二者的动态加权求和实现应用。该方法使算法能够自适应地调整权重，确保每个客户端无论其数据规模或初始质量如何，都能对全局模型做出贡献。实验表明，所提出的算法在通用开源数据集CIFAR-10和视觉避障专用数据集上，均优于多种现有的先进聚合方法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/