Federated learning (FL) has become a key component in various language modeling applications such as machine translation, next-word prediction, and medical record analysis. These applications are trained on datasets from many FL participants that often include privacy-sensitive data, such as healthcare records, phone/credit card numbers, login credentials, etc. Although FL enables computation without necessitating clients to share their raw data, determining the extent of privacy leakage in federated language models is challenging and not straightforward. Moreover, existing attacks aim to extract data regardless of how sensitive or naive it is. To fill this research gap, we introduce two novel findings with regard to leaking privacy-sensitive user data from federated large language models. Firstly, we make a key observation that model snapshots from the intermediate rounds in FL can cause greater privacy leakage than the final trained model. Secondly, we identify that privacy leakage can be aggravated by tampering with a model's selective weights that are specifically responsible for memorizing the sensitive training data. We show how a malicious client can leak the privacy-sensitive data of some other users in FL even without any cooperation from the server. Our best-performing method improves the membership inference recall by 29% and achieves up to 71% private data reconstruction, evidently outperforming existing attacks with stronger assumptions of adversary capabilities.
翻译:联邦学习(FL)已成为机器翻译、下一词预测和医疗记录分析等多种语言建模应用的关键组成部分。这些应用基于来自众多FL参与者的数据集进行训练,这些数据通常包含隐私敏感信息,如医疗记录、电话/信用卡号码、登录凭证等。尽管FL使得计算无需客户端共享原始数据,但评估联邦语言模型中隐私泄露的程度具有挑战性且并不直观。此外,现有攻击方法旨在提取数据,无论其敏感程度或普通性如何。为填补这一研究空白,我们针对从联邦大语言模型中泄露隐私敏感用户数据提出了两项新发现。首先,我们观察到FL中间轮次的模型快照可能比最终训练模型造成更严重的隐私泄露。其次,我们发现通过篡改模型中专门负责记忆敏感训练数据的选择性权重,可以加剧隐私泄露。我们展示了恶意客户端如何在无需服务器任何配合的情况下,泄露FL中其他用户的隐私敏感数据。我们性能最佳的方法将成员推断召回率提升了29%,并实现了高达71%的私有数据重建,明显优于那些需要更强攻击者能力假设的现有攻击方法。