FDInet: Protecting against DNN Model Extraction via Feature Distortion Index

Machine Learning as a Service (MLaaS) platforms have gained popularity due to their accessibility, cost-efficiency, scalability, and rapid development capabilities. However, recent research has highlighted the vulnerability of cloud-based models in MLaaS to model extraction attacks. In this paper, we introduce FDINET, a novel defense mechanism that leverages the feature distribution of deep neural network (DNN) models. Concretely, by analyzing the feature distribution from the adversary's queries, we reveal that the feature distribution of these queries deviates from that of the model's training set. Based on this key observation, we propose Feature Distortion Index (FDI), a metric designed to quantitatively measure the feature distribution deviation of received queries. The proposed FDINET utilizes FDI to train a binary detector and exploits FDI similarity to identify colluding adversaries from distributed extraction attacks. We conduct extensive experiments to evaluate FDINET against six state-of-the-art extraction attacks on four benchmark datasets and four popular model architectures. Empirical results demonstrate the following findings FDINET proves to be highly effective in detecting model extraction, achieving a 100% detection accuracy on DFME and DaST. FDINET is highly efficient, using just 50 queries to raise an extraction alarm with an average confidence of 96.08% for GTSRB. FDINET exhibits the capability to identify colluding adversaries with an accuracy exceeding 91%. Additionally, it demonstrates the ability to detect two types of adaptive attacks.

翻译：机器学习即服务（MLaaS）平台因其可访问性、成本效益、可扩展性和快速开发能力而广受欢迎。然而，近期研究表明，MLaaS中的云端模型易受模型窃取攻击。本文提出一种新的防御机制FDINET，该机制利用深度神经网络（DNN）模型的特征分布特性。具体而言，通过分析攻击者查询的特征分布，我们揭示了这些查询的特征分布与模型训练集特征分布存在偏差。基于这一关键发现，我们提出特征畸变指数（FDI）——一种用于定量衡量接收查询特征分布偏差的指标。所提出的FDINET利用FDI训练二分类检测器，并通过FDI相似度识别分布式窃取攻击中的共谋攻击者。我们在四个基准数据集和四种主流模型架构上，针对六种最先进的窃取攻击进行了大量实验来评估FDINET的有效性。实验结果表明：FDINET在检测模型窃取方面具有高效性，对DFME和DaST攻击的检测准确率达100%；FDINET具有高灵敏度，仅需50次查询即能在GTSRB数据集上以平均96.08%的置信度触发窃取告警；FDINET识别共谋攻击者的准确率超过91%；此外，它还能检测两类自适应攻击。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日