Privacy Evaluation Benchmarks for NLP Models

By inducing privacy attacks on NLP models, attackers can obtain sensitive information such as training data and model parameters, etc. Although researchers have studied, in-depth, several kinds of attacks in NLP models, they are non-systematic analyses. It lacks a comprehensive understanding of the impact caused by the attacks. For example, we must consider which scenarios can apply to which attacks, what the common factors are that affect the performance of different attacks, the nature of the relationships between different attacks, and the influence of various datasets and models on the effectiveness of the attacks, etc. Therefore, we need a benchmark to holistically assess the privacy risks faced by NLP models. In this paper, we present a privacy attack and defense evaluation benchmark in the field of NLP, which includes the conventional/small models and large language models (LLMs). This benchmark supports a variety of models, datasets, and protocols, along with standardized modules for comprehensive evaluation of attacks and defense strategies. Based on the above framework, we present a study on the association between auxiliary data from different domains and the strength of privacy attacks. And we provide an improved attack method in this scenario with the help of Knowledge Distillation (KD). Furthermore, we propose a chained framework for privacy attacks. Allowing a practitioner to chain multiple attacks to achieve a higher-level attack objective. Based on this, we provide some defense and enhanced attack strategies. The code for reproducing the results can be found at https://github.com/user2311717757/nlp_doctor.

翻译：通过对自然语言处理模型实施隐私攻击，攻击者能够获取训练数据、模型参数等敏感信息。尽管研究者已对自然语言处理模型中的多种攻击方式进行了深入探讨，但这些分析缺乏系统性，尚未形成对攻击影响的全面认知。例如，我们需要明确不同攻击适用的具体场景、影响各类攻击性能的共同因素、不同攻击之间的本质关联，以及多样化数据集与模型对攻击有效性的影响等。因此，亟需建立一套能够系统性评估自然语言处理模型隐私风险的基准框架。本文提出一个涵盖传统/小型模型与大型语言模型（LLMs）的自然语言处理隐私攻防评估基准，该基准支持多种模型、数据集与协议，并配备标准化模块以实现对攻击与防御策略的全面评估。基于上述框架，我们研究了跨领域辅助数据与隐私攻击强度之间的关联性，并借助知识蒸馏（KD）技术提出该场景下的改进攻击方法。此外，我们设计了一种链式隐私攻击框架，使实践者能够通过串联多种攻击实现更高层级的攻击目标。在此基础上，我们提出了若干防御与增强攻击策略。实验复现代码已发布于 https://github.com/user2311717757/nlp_doctor。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日