Natural language processing (NLP) models have become increasingly popular in real-world applications, such as text classification. However, they are vulnerable to privacy attacks, including data reconstruction attacks that aim to extract the data used to train the model. Most previous studies on data reconstruction attacks have focused on LLM, while classification models were assumed to be more secure. In this work, we propose a new targeted data reconstruction attack called the Mix And Match attack, which takes advantage of the fact that most classification models are based on LLM. The Mix And Match attack uses the base model of the target model to generate candidate tokens and then prunes them using the classification head. We extensively demonstrate the effectiveness of the attack using both random and organic canaries. This work highlights the importance of considering the privacy risks associated with data reconstruction attacks in classification models and offers insights into possible leakages.
翻译:自然语言处理(NLP)模型在现实应用(如文本分类)中日益普及。然而,它们容易遭受隐私攻击,包括旨在提取用于训练模型数据的数据重建攻击。以往大多数关于数据重建攻击的研究都聚焦于大语言模型(LLM),而分类模型被认为更为安全。在本工作中,我们提出了一种新的针对性数据重建攻击方法,称为“混合匹配攻击”,其利用多数分类模型基于LLM构建这一事实。该攻击借助目标模型的基础模型生成候选词元,再通过分类头对其进行剪枝。我们通过随机金丝雀和有机金丝雀广泛验证了该攻击的有效性。本工作强调了关注分类模型面临的数据重建攻击隐私风险的重要性,并为潜在信息泄露提供了见解。