By inducing privacy attacks on NLP models, attackers can obtain sensitive information such as training data and model parameters, etc. Although researchers have studied, in-depth, several kinds of attacks in NLP models, they are non-systematic analyses. It lacks a comprehensive understanding of the impact caused by the attacks. For example, we must consider which scenarios can apply to which attacks, what the common factors are that affect the performance of different attacks, the nature of the relationships between different attacks, and the influence of various datasets and models on the effectiveness of the attacks, etc. Therefore, we need a benchmark to holistically assess the privacy risks faced by NLP models. In this paper, we present a privacy attack and defense evaluation benchmark in the field of NLP, which includes the conventional/small models and large language models (LLMs). This benchmark supports a variety of models, datasets, and protocols, along with standardized modules for comprehensive evaluation of attacks and defense strategies. Based on the above framework, we present a study on the association between auxiliary data from different domains and the strength of privacy attacks. And we provide an improved attack method in this scenario with the help of Knowledge Distillation (KD). Furthermore, we propose a chained framework for privacy attacks. Allowing a practitioner to chain multiple attacks to achieve a higher-level attack objective. Based on this, we provide some defense and enhanced attack strategies. The code for reproducing the results can be found at https://github.com/user2311717757/nlp_doctor.
翻译:通过对自然语言处理模型实施隐私攻击,攻击者能够获取训练数据、模型参数等敏感信息。尽管研究者已对自然语言处理模型中的多种攻击方式进行了深入探讨,但这些分析缺乏系统性,尚未形成对攻击影响的全面认知。例如,我们需要明确不同攻击适用的具体场景、影响各类攻击性能的共同因素、不同攻击之间的本质关联,以及多样化数据集与模型对攻击有效性的影响等。因此,亟需建立一套能够系统性评估自然语言处理模型隐私风险的基准框架。本文提出一个涵盖传统/小型模型与大型语言模型(LLMs)的自然语言处理隐私攻防评估基准,该基准支持多种模型、数据集与协议,并配备标准化模块以实现对攻击与防御策略的全面评估。基于上述框架,我们研究了跨领域辅助数据与隐私攻击强度之间的关联性,并借助知识蒸馏(KD)技术提出该场景下的改进攻击方法。此外,我们设计了一种链式隐私攻击框架,使实践者能够通过串联多种攻击实现更高层级的攻击目标。在此基础上,我们提出了若干防御与增强攻击策略。实验复现代码已发布于 https://github.com/user2311717757/nlp_doctor。