Model for Peanuts: Hijacking ML Models without Training Access is Possible

The massive deployment of Machine Learning (ML) models has been accompanied by the emergence of several attacks that threaten their trustworthiness and raise ethical and societal concerns such as invasion of privacy, discrimination risks, and lack of accountability. Model hijacking is one of these attacks, where the adversary aims to hijack a victim model to execute a different task than its original one. Model hijacking can cause accountability and security risks since a hijacked model owner can be framed for having their model offering illegal or unethical services. Prior state-of-the-art works consider model hijacking as a training time attack, whereby an adversary requires access to the ML model training to execute their attack. In this paper, we consider a stronger threat model where the attacker has no access to the training phase of the victim model. Our intuition is that ML models, typically over-parameterized, might (unintentionally) learn more than the intended task for they are trained. We propose a simple approach for model hijacking at inference time named SnatchML to classify unknown input samples using distance measures in the latent space of the victim model to previously known samples associated with the hijacking task classes. SnatchML empirically shows that benign pre-trained models can execute tasks that are semantically related to the initial task. Surprisingly, this can be true even for hijacking tasks unrelated to the original task. We also explore different methods to mitigate this risk. We first propose a novel approach we call meta-unlearning, designed to help the model unlearn a potentially malicious task while training on the original task dataset. We also provide insights on over-parameterization as one possible inherent factor that makes model hijacking easier, and we accordingly propose a compression-based countermeasure against this attack.

翻译：机器学习模型的大规模部署伴随着多种攻击的出现，这些攻击威胁着模型的可信度，并引发隐私侵犯、歧视风险和问责缺失等伦理与社会关切。模型劫持是此类攻击之一，攻击者旨在劫持受害者模型以执行不同于其原始任务的任务。模型劫持可能导致问责与安全风险，因为被劫持的模型所有者可能因其模型提供非法或不道德服务而被追责。先前的前沿研究将模型劫持视为训练时攻击，即攻击者需要访问机器学习模型的训练过程才能实施攻击。本文考虑一种更强的威胁模型：攻击者无法访问受害者模型的训练阶段。我们的直觉是，通常过度参数化的机器学习模型可能（无意中）学习到超出其训练预期任务的内容。我们提出一种名为SnatchML的推理时模型劫持方法，该方法通过计算输入样本在受害者模型潜在空间中与已知劫持任务类别样本的距离度量来实现未知样本分类。SnatchML实证研究表明，良性预训练模型能够执行与初始任务语义相关的任务。令人惊讶的是，即使对于与原始任务无关的劫持任务，这种现象也可能成立。我们还探索了多种缓解此风险的方法。首先提出一种称为元反学习的新方法，旨在帮助模型在原始任务数据集训练过程中消除潜在恶意任务的学习痕迹。此外，我们分析了过度参数化作为促使模型劫持更容易发生的潜在内在因素，并相应提出基于模型压缩的防御对策。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日