Model extraction attacks (MEAs) enable an attacker to replicate the functionality of a victim deep neural network (DNN) model by only querying its API service remotely, posing a severe threat to the security and integrity of pay-per-query DNN-based services. Although the majority of current research on MEAs has primarily concentrated on neural classifiers, there is a growing prevalence of image-to-image translation (I2IT) tasks in our everyday activities. However, techniques developed for MEA of DNN classifiers cannot be directly transferred to the case of I2IT, rendering the vulnerability of I2IT models to MEA attacks often underestimated. This paper unveils the threat of MEA in I2IT tasks from a new perspective. Diverging from the traditional approach of bridging the distribution gap between attacker queries and victim training samples, we opt to mitigate the effect caused by the different distributions, known as the domain shift. This is achieved by introducing a new regularization term that penalizes high-frequency noise, and seeking a flatter minimum to avoid overfitting to the shifted distribution. Extensive experiments on different image translation tasks, including image super-resolution and style transfer, are performed on different backbone victim models, and the new design consistently outperforms the baseline by a large margin across all metrics. A few real-life I2IT APIs are also verified to be extremely vulnerable to our attack, emphasizing the need for enhanced defenses and potentially revised API publishing policies.
翻译:模型窃取攻击(MEAs)使得攻击者仅通过远程查询受害者深度神经网络(DNN)模型的API服务即可复制其功能,这对按查询付费的基于DNN的服务的安全性与完整性构成了严重威胁。尽管当前关于MEAs的研究主要集中在神经分类器上,但图像到图像翻译(I2IT)任务在日常活动中日益普遍。然而,为DNN分类器开发的MEA技术无法直接迁移至I2IT场景,导致I2IT模型对MEA攻击的脆弱性常被低估。本文从一个新视角揭示了I2IT任务中MEA的威胁。不同于传统方法中弥合攻击者查询与受害者训练样本间分布差距的思路,我们选择缓解由不同分布(即域偏移)所造成的影响。这是通过引入一个惩罚高频噪声的新正则化项,并寻求更平坦的最小值以避免对偏移分布的过拟合来实现的。我们在包括图像超分辨率和风格迁移在内的不同图像翻译任务上进行了大量实验,针对不同的骨干受害者模型,新设计在所有指标上均大幅超越基线方法。一些现实中的I2IT API也被验证对我们的攻击极为脆弱,这强调了加强防御措施及可能需修订API发布政策的必要性。