Memory Backdoor Attacks on Neural Networks

Neural networks, such as image classifiers, are frequently trained on proprietary and confidential datasets. It is generally assumed that once deployed, the training data remains secure, as adversaries are limited to query response interactions with the model, where at best, fragments of arbitrary data can be inferred without any guarantees on their authenticity. In this paper, we propose the memory backdoor attack, where a model is covertly trained to memorize specific training samples and later selectively output them when triggered with an index pattern. What makes this attack unique is that it (1) works even when the tasks conflict (making a classifier output images), (2) enables the systematic extraction of training samples from deployed models and (3) offers guarantees on the extracted authenticity of the data. We demonstrate the attack on image classifiers, segmentation models, and a large language model (LLM). We demonstrate the attack on image classifiers, segmentation models, and a large language model (LLM). With this attack, it is possible to hide thousands of images and texts in modern vision architectures and LLMs respectively, all while maintaining model performance. The memory back door attack poses a significant threat not only to conventional model deployments but also to federated learning paradigms and other modern frameworks. Therefore, we suggest an efficient and effective countermeasure that can be immediately applied and advocate for further work on the topic.

翻译：神经网络（如图像分类器）通常在专有和机密数据集上进行训练。人们普遍认为，一旦模型部署后，训练数据是安全的，因为攻击者仅限于与模型进行查询-响应的交互，最多只能推断出任意数据的片段，且无法保证其真实性。本文提出了一种记忆后门攻击，其中模型被秘密训练以记忆特定的训练样本，并在接收到索引模式触发时选择性地输出这些样本。该攻击的独特之处在于：（1）即使在任务冲突的情况下（例如使分类器输出图像）也能生效；（2）能够从已部署的模型中系统性地提取训练样本；（3）为提取数据的真实性提供保证。我们在图像分类器、分割模型以及一个大语言模型（LLM）上演示了该攻击。通过这种攻击，可以在现代视觉架构和LLM中分别隐藏数千张图像和文本，同时保持模型性能。记忆后门攻击不仅对传统的模型部署构成重大威胁，也对联邦学习范式及其他现代框架构成威胁。因此，我们提出了一种可立即应用的高效有效对策，并倡导就该主题开展进一步的研究工作。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日