Dordis: Efficient Federated Learning with Dropout-Resilient Differential Privacy

Federated learning (FL) is increasingly deployed among multiple clients to train a shared model over decentralized data. To address privacy concerns, FL systems need to safeguard the clients' data from disclosure during training and control data leakage through trained models when exposed to untrusted domains. Distributed differential privacy (DP) offers an appealing solution in this regard as it achieves a balanced tradeoff between privacy and utility without a trusted server. However, existing distributed DP mechanisms are impractical in the presence of client dropout, resulting in poor privacy guarantees or degraded training accuracy. In addition, these mechanisms suffer from severe efficiency issues. We present Dordis, a distributed differentially private FL framework that is highly efficient and resilient to client dropout. Specifically, we develop a novel `add-then-remove' scheme that enforces a required noise level precisely in each training round, even if some sampled clients drop out. This ensures that the privacy budget is utilized prudently, despite unpredictable client dynamics. To boost performance, Dordis operates as a distributed parallel architecture via encapsulating the communication and computation operations into stages. It automatically divides the global model aggregation into several chunk-aggregation tasks and pipelines them for optimal speedup. Large-scale deployment evaluations demonstrate that Dordis efficiently handles client dropout in various realistic FL scenarios, achieving the optimal privacy-utility tradeoff and accelerating training by up to 2.4$\times$ compared to existing solutions.

翻译：联邦学习（FL）日益部署于多个客户端之间，用于在去中心化数据上训练共享模型。为解决隐私问题，FL系统需要保护训练过程中客户数据不被泄露，并控制通过训练模型暴露给不可信域时的数据泄漏。分布式差分隐私（DP）为此提供了一个有吸引力的解决方案，因为它在无需可信服务器的情况下实现了隐私与效用之间的平衡权衡。然而，现有分布式DP机制在客户端丢弃情况下不实用，导致隐私保障低下或训练精度下降。此外，这些机制存在严重的效率问题。我们提出Dordis，一个高效且能抗客户端丢弃的分布式差分隐私FL框架。具体而言，我们开发了一种新颖的“先加后减”方案，该方案能在每一训练轮次中精确施加所需噪声水平，即使部分采样客户端丢弃也不例外。这确保了隐私预算得到审慎利用，尽管存在不可预测的客户端动态。为提升性能，Dordis通过将通信和计算操作封装成阶段，实现分布式并行架构。它自动将全局模型聚合划分为若干块聚合任务，并对其进行流水线处理以实现最优加速。大规模部署评估表明，Dordis能在各种实际FL场景中高效应对客户端丢弃，实现最优的隐私-效用权衡，并将训练速度提升至现有解决方案的2.4倍。