Model extraction attacks (MEAs) enable an attacker to replicate the functionality of a victim deep neural network (DNN) model by only querying its API service remotely, posing a severe threat to the security and integrity of pay-per-query DNN-based services. Although the majority of current research on MEAs has primarily concentrated on neural classifiers, there is a growing prevalence of image-to-image translation (I2IT) tasks in our everyday activities. However, techniques developed for MEA of DNN classifiers cannot be directly transferred to the case of I2IT, rendering the vulnerability of I2IT models to MEA attacks often underestimated. This paper unveils the threat of MEA in I2IT tasks from a new perspective. Diverging from the traditional approach of bridging the distribution gap between attacker queries and victim training samples, we opt to mitigate the effect caused by the different distributions, known as the domain shift. This is achieved by introducing a new regularization term that penalizes high-frequency noise, and seeking a flatter minimum to avoid overfitting to the shifted distribution. Extensive experiments on different image translation tasks, including image super-resolution and style transfer, are performed on different backbone victim models, and the new design consistently outperforms the baseline by a large margin across all metrics. A few real-life I2IT APIs are also verified to be extremely vulnerable to our attack, emphasizing the need for enhanced defenses and potentially revised API publishing policies.
翻译:模型提取攻击(MEA)使攻击者能够通过仅远程查询受害深度神经网络(DNN)模型的API服务来复制其功能,对基于按查询付费的DNN服务的安全性和完整性构成严重威胁。尽管当前关于MEA的研究主要集中在神经分类器上,但图像到图像翻译(I2IT)任务在我们的日常活动中日益普遍。然而,针对DNN分类器MEA开发的技术无法直接迁移到I2IT场景,导致I2IT模型对MEA攻击的脆弱性常被低估。本文从一个新视角揭示了I2IT任务中MEA的威胁。与弥合攻击者查询与受害训练样本之间分布差距的传统方法不同,我们选择缓解由不同分布(即域偏移)造成的影响。这是通过引入一个新的正则化项来抑制高频噪声,并寻求更平坦的极小值以避免对偏移分布过拟合来实现的。在不同骨干受害模型上,针对包括图像超分辨率和风格迁移在内的多种图像翻译任务进行了广泛实验,新设计在所有指标上均以较大幅度持续超越基线。同时,我们验证了多个真实I2IT API极易受到我们攻击的影响,突显了加强防御以及可能修订API发布策略的必要性。