Federated learning (FL) is an emerging paradigm that allows a central server to train machine learning models using remote users' data. Despite its growing popularity, FL faces challenges in preserving the privacy of local datasets, its sensitivity to poisoning attacks by malicious users, and its communication overhead. The latter is additionally considerably dominant in large-scale networks. These limitations are often individually mitigated by local differential privacy (LDP) mechanisms, robust aggregation, compression, and user selection techniques, which typically come at the cost of accuracy. In this work, we present compressed private aggregation (CPA), that allows massive deployments to simultaneously communicate at extremely low bit rates while achieving privacy, anonymity, and resilience to malicious users. CPA randomizes a codebook for compressing the data into a few bits using nested lattice quantizers, while ensuring anonymity and robustness, with a subsequent perturbation to hold LDP. The proposed CPA is proven to result in FL convergence in the same asymptotic rate as FL without privacy, compression, and robustness considerations, while satisfying both anonymity and LDP requirements. These analytical properties are empirically confirmed in a numerical study, where we demonstrate the performance gains of CPA compared with separate mechanisms for compression and privacy for training different image classification models, as well as its robustness in mitigating the harmful effects of malicious users.
翻译:联邦学习(FL)是一种新兴范式,允许中央服务器利用远程用户数据训练机器学习模型。尽管其日益普及,FL在保护本地数据集隐私、对恶意用户投毒攻击的敏感性以及通信开销方面仍面临挑战。后者在大规模网络中尤为突出。这些局限性通常通过本地差分隐私(LDP)机制、鲁棒聚合、压缩和用户选择技术分别缓解,但往往以牺牲精度为代价。本文提出压缩私有聚合(CPA),使得大规模部署能够同时以极低比特率进行通信,同时实现隐私、匿名性和对恶意用户的鲁棒性。CPA通过嵌套格量化器随机化码本,将数据压缩为少数比特,同时确保匿名性和鲁棒性,并随后进行扰动以满足LDP。理论证明,所提CPA在满足匿名性和LDP要求的同时,能够以与不考虑隐私、压缩和鲁棒性的FL相同的渐近速率实现收敛。数值研究实证验证了这些分析特性,展示了CPA相较于独立的压缩和隐私机制在训练不同图像分类模型时的性能优势,及其在缓解恶意用户有害影响方面的鲁棒性。