The federated learning (FL) technique was initially developed to mitigate data privacy issues that can arise in the traditional machine learning paradigm. While FL ensures that a user's data always remain with the user, the gradients of the locally trained models must be communicated with the centralized server to build the global model. This results in privacy leakage, where the server can infer private information of the users' data from the shared gradients. To mitigate this flaw, the next-generation FL architectures proposed encryption and anonymization techniques to protect the model updates from the server. However, this approach creates other challenges, such as a malicious user might sabotage the global model by sharing false gradients. Since the gradients are encrypted, the server is unable to identify and eliminate rogue users which would protect the global model. Therefore, to mitigate both attacks, this paper proposes a novel fully homomorphic encryption (FHE) based scheme suitable for FL. We modify the one-to-one single-key Cheon-Kim-Kim-Song (CKKS)-based FHE scheme into a distributed multi-key additive homomorphic encryption scheme that supports model aggregation in FL. We employ a novel aggregation scheme within the encrypted domain, utilizing users' non-poisoning rates, to effectively address data poisoning attacks while ensuring privacy is preserved by the proposed encryption scheme. Rigorous security, privacy, convergence, and experimental analyses have been provided to show that FheFL is novel, secure, and private, and achieves comparable accuracy at reasonable computational cost.
翻译:联邦学习(FL)技术最初旨在缓解传统机器学习范式中可能出现的数据隐私问题。尽管FL确保用户数据始终保留在本地,但局部训练模型的梯度必须与中央服务器通信以构建全局模型。这导致了隐私泄露——服务器能够从共享梯度中推断用户数据的隐私信息。为弥补这一缺陷,下一代FL架构提出了加密和匿名化技术以保护模型更新不被服务器窥探。然而,该方法带来了其他挑战,例如恶意用户可能通过共享虚假梯度破坏全局模型。由于梯度已加密,服务器无法识别并消除有害用户以保护全局模型。因此,为同时缓解两种攻击,本文提出了一种适用于FL的新型完全同态加密(FHE)方案。我们将一对一的单密钥Cheon-Kim-Kim-Song(CKKS)FHE方案改造为支持FL中模型聚合的分布式多密钥加法同态加密方案。我们在加密域内采用了一种新颖的聚合方案,利用用户的非中毒率有效应对数据投毒攻击,同时通过所提出的加密方案确保隐私保护。严格的安全性、隐私性、收敛性和实验分析表明,FheFL具有新颖性、安全性和隐私性,且在合理计算开销下达到了可比精度。