Federated learning (FL) emphasizes decentralized training by storing data locally and sending only model updates, underlining user privacy. Recently, a line of works on privacy attacks impairs user privacy by extracting sensitive training text from language models in the context of FL. Yet, these attack techniques face distinct hurdles: some work chiefly with limited batch sizes (e.g., batch size of 1), and others are easily detectable. This paper introduces an innovative approach that is challenging to detect, significantly enhancing the recovery rate of text in various batch-size settings. Building on fundamental gradient matching and domain prior knowledge, we enhance the attack by recovering the input of the Pooler layer of language models, which enables us to provide additional supervised signals at the feature level. Unlike gradient data, these signals do not average across sentences and tokens, thereby offering more nuanced and effective insights. We benchmark our method using text classification tasks on datasets such as CoLA, SST-2, and Rotten Tomatoes. Across different batch sizes and models, our approach consistently outperforms previous state-of-the-art results.
翻译:联邦学习(FL)强调通过将数据存储在本地并仅发送模型更新来实现去中心化训练,从而突出用户隐私保护。近年来,一系列关于隐私攻击的研究通过从联邦学习环境下的语言模型中提取敏感训练文本,损害了用户隐私。然而,这些攻击技术面临不同的障碍:有些技术主要适用于有限批大小(例如批大小为1),而其他技术则容易被检测。本文提出了一种难以检测的创新方法,显著提升了多种批大小设置下的文本恢复率。基于基本的梯度匹配和领域先验知识,我们通过恢复语言模型池化层的输入来增强攻击效果,这使我们能够在特征层面提供额外的监督信号。与梯度数据不同,这些信号不会在句子和词元之间进行平均,从而提供更细致且有效的见解。我们在CoLA、SST-2和Rotten Tomatoes等数据集上使用文本分类任务进行基准测试。在不同的批大小和模型下,我们的方法始终优于先前的最优结果。