Federated Learning (FL) enables collaborative model training while preserving the privacy of raw data. A challenge in this framework is the fair and efficient valuation of data, which is crucial for incentivizing clients to contribute high-quality data in the FL task. In scenarios involving numerous data clients within FL, it is often the case that only a subset of clients and datasets are pertinent to a specific learning task, while others might have either a negative or negligible impact on the model training process. This paper introduces a novel privacy-preserving method for evaluating client contributions and selecting relevant datasets without a pre-specified training algorithm in an FL task. Our proposed approach FedBary, utilizes Wasserstein distance within the federated context, offering a new solution for data valuation in the FL framework. This method ensures transparent data valuation and efficient computation of the Wasserstein barycenter and reduces the dependence on validation datasets. Through extensive empirical experiments and theoretical analyses, we demonstrate the potential of this data valuation method as a promising avenue for FL research.
翻译:联邦学习(FL)在保障原始数据隐私的同时实现了协作模型训练。该框架中的一个关键挑战是如何公平高效地进行数据估值,这对于激励客户端在FL任务中贡献高质量数据至关重要。当FL涉及大量数据客户端时,通常仅有部分客户端和数据集与特定学习任务相关,而其他客户端或数据集可能对模型训练过程产生负面或可忽略的影响。本文提出了一种新颖的隐私保护方法,可以在FL任务中无需预设训练算法的情况下评估客户端贡献并筛选相关数据集。我们提出的FedBary方法在联邦框架内采用Wasserstein距离,为FL中的数据估值提供了全新解决方案。该方法实现了透明的数据估值与Wasserstein重心的高效计算,并减少了对验证数据集的依赖。通过广泛的实证实验与理论分析,我们证明了这种数据估值方法作为FL研究方向的巨大潜力。