Optimal Client Sampling in Federated Learning with Client-Level Heterogeneous Differential Privacy

Federated Learning with client-level differential privacy (DP) provides a promising framework for collaboratively training models while rigorously protecting clients' privacy. However, classic approaches like DP-FedAvg struggle when clients have heterogeneous privacy requirements, as they must uniformly enforce the strictest privacy level across all clients, leading to excessive DP noise and significant degradation in model utility. Existing methods to improve the model utility in such heterogeneous privacy settings often assume a trusted server and are largely heuristic, resulting in suboptimal performance and lacking strong theoretical foundations. In this work, we address these challenges under a practical attack model where both clients and the server are honest-but-curious. We propose GDPFed, which partitions clients into groups based on their privacy budgets and achieves client-level DP within each group to reduce the privacy budget waste and hence improve the model utility. Based on the privacy and convergence analysis of GDPFed, we find that the magnitude of DP noise depends on both model dimensionality and the per-group client sampling ratios. To further improve the performance of GDPFed, we introduce GDPFed$^+$, which integrates model sparsification to eliminate unnecessary noise and optimizes per-group client sampling ratios to minimize convergence error. Extensive empirical evaluations on multiple benchmark datasets demonstrate the effectiveness of GDPFed$^+$, showing substantial performance gains compared with state-of-the-art methods.

翻译：具有客户端级差分隐私（DP）的联邦学习为协作训练模型同时严格保护客户端隐私提供了一个有前景的框架。然而，当客户端具有异构的隐私要求时，诸如DP-FedAvg之类的经典方法会遇到困难，因为它们必须在所有客户端上统一执行最严格的隐私级别，从而导致过度的DP噪声和模型效用的显著下降。现有方法旨在改善此类异构隐私设置下的模型效用，但它们通常假设存在可信服务器，并且很大程度上是启发式的，导致次优性能且缺乏坚实的理论基础。在本工作中，我们在一个实际的攻击模型下应对这些挑战，该模型中客户端和服务器均为诚实但好奇的。我们提出了GDPFed，它根据客户端的隐私预算将其划分为不同的组，并在每个组内实现客户端级DP，以减少隐私预算浪费，从而提高模型效用。基于对GDPFed的隐私性和收敛性分析，我们发现DP噪声的幅度取决于模型维度以及各组内的客户端采样比例。为了进一步提升GDPFed的性能，我们引入了GDPFed$^+$，它集成了模型稀疏化以消除不必要的噪声，并优化了各组内的客户端采样比例以最小化收敛误差。在多个基准数据集上进行的大量实证评估证明了GDPFed$^+$的有效性，与最先进的方法相比，它显示出显著的性能提升。