Federated learning (FL) emphasizes decentralized training by storing data locally and sending only model updates, underlining user privacy. Recently, a line of works on privacy attacks impairs user privacy by extracting sensitive training text from language models in the context of FL. Yet, these attack techniques face distinct hurdles: some work chiefly with limited batch sizes (e.g., batch size of 1), and others are easily detectable. This paper introduces an innovative approach that is challenging to detect, significantly enhancing the recovery rate of text in various batch-size settings. Building on fundamental gradient matching and domain prior knowledge, we enhance the attack by recovering the input of the Pooler layer of language models, which enables us to provide additional supervised signals at the feature level. Unlike gradient data, these signals do not average across sentences and tokens, thereby offering more nuanced and effective insights. We benchmark our method using text classification tasks on datasets such as CoLA, SST-2, and Rotten Tomatoes. Across different batch sizes and models, our approach consistently outperforms previous state-of-the-art results.
翻译:联邦学习(FL)强调通过将数据存储在本地并仅发送模型更新来实现去中心化训练,从而保护用户隐私。近年来,一系列隐私攻击研究通过从联邦学习场景下的语言模型中提取敏感训练文本,削弱了用户隐私保护。然而,这些攻击技术面临不同的障碍:部分方法主要适用于有限批次大小(如批次大小为1),而其他方法则容易被检测到。本文提出了一种难以检测的创新方法,显著提升了多种批次大小设置下的文本恢复率。基于基础的梯度匹配与领域先验知识,我们通过恢复语言模型池化层的输入来增强攻击效果,从而在特征层面提供额外的监督信号。与梯度数据不同,这些信号不会跨句子和令牌进行平均,因此能提供更细致有效的洞察。我们在CoLA、SST-2和Rotten Tomatoes等数据集上使用文本分类任务进行了基准测试。在不同批次大小和模型设置下,我们的方法持续超越了此前的最优结果。