Model extraction attacks (MEAs) enable an attacker to replicate the functionality of a victim deep neural network (DNN) model by only querying its API service remotely, posing a severe threat to the security and integrity of pay-per-query DNN-based services. Although the majority of current research on MEAs has primarily concentrated on neural classifiers, there is a growing prevalence of image-to-image translation (I2IT) tasks in our everyday activities. However, techniques developed for MEA of DNN classifiers cannot be directly transferred to the case of I2IT, rendering the vulnerability of I2IT models to MEA attacks often underestimated. This paper unveils the threat of MEA in I2IT tasks from a new perspective. Diverging from the traditional approach of bridging the distribution gap between attacker queries and victim training samples, we opt to mitigate the effect caused by the different distributions, known as the domain shift. This is achieved by introducing a new regularization term that penalizes high-frequency noise, and seeking a flatter minimum to avoid overfitting to the shifted distribution. Extensive experiments on different image translation tasks, including image super-resolution and style transfer, are performed on different backbone victim models, and the new design consistently outperforms the baseline by a large margin across all metrics. A few real-life I2IT APIs are also verified to be extremely vulnerable to our attack, emphasizing the need for enhanced defenses and potentially revised API publishing policies.
翻译:模型提取攻击使攻击者能够仅通过远程查询受害深度神经网络(DNN)模型的API服务来复制其功能,对基于按查询付费的DNN服务的完整性和安全性构成严重威胁。尽管当前关于模型提取攻击的研究主要集中于神经分类器,但图像到图像翻译任务在日常活动中的普及性日益增长。然而,针对DNN分类器开发的模型提取攻击技术无法直接迁移至图像到图像翻译场景,导致后者对模型提取攻击的脆弱性常被低估。本文从全新视角揭示了图像到图像翻译任务中模型提取攻击的威胁。与传统方法致力于弥合攻击者查询与受害者训练样本之间的分布差距不同,我们选择缓解由不同分布(即域偏移)所引发的影响。具体而言,我们引入一种惩罚高频噪声的新正则化项,并寻求更平坦的损失最小值以避免对偏移分布的过拟合。在包括图像超分辨率和风格迁移在内的不同图像翻译任务上,针对多种骨干受害模型进行了大量实验,新设计在所有指标上均以显著优势优于基线方法。此外,多个真实图像到图像翻译API被证实极易受我们的攻击影响,这凸显了加强防御机制的迫切性以及API发布策略可能需修订的必要性。