Deep neural networks (DNNs) deployed in a cloud often allow users to query models via the APIs. However, these APIs expose the models to model extraction attacks (MEAs). In this attack, the attacker attempts to duplicate the target model by abusing the responses from the API. Backdoor-based DNN watermarking is known as a promising defense against MEAs, wherein the defender injects a backdoor into extracted models via API responses. The backdoor is used as a watermark of the model; if a suspicious model has the watermark (i.e., backdoor), it is verified as an extracted model. This work focuses on object detection (OD) models. Existing backdoor attacks on OD models are not applicable for model watermarking as the defense against MEAs on a realistic threat model. Our proposed approach involves inserting a backdoor into extracted models via APIs by stealthily modifying the bounding-boxes (BBs) of objects detected in queries while keeping the OD capability. In our experiments on three OD datasets, the proposed approach succeeded in identifying the extracted models with 100% accuracy in a wide variety of experimental scenarios.
翻译:部署在云端的深度神经网络(DNNs)通常允许用户通过API查询模型。然而,这些API使模型暴露于模型提取攻击(MEAs)的风险之下。在此类攻击中,攻击者试图通过滥用API的响应来复制目标模型。基于后门的DNN水印技术被认为是一种对抗MEAs的有效防御手段,其中防御者通过API响应将后门注入被提取的模型中。该后门被用作模型的水印;如果一个可疑模型含有该水印(即后门),则被验证为提取模型。本工作聚焦于目标检测(OD)模型。现有的针对OD模型的后门攻击由于不符合现实威胁模型下的防御需求,不适用于作为对抗MEAs的模型水印方案。我们提出的方法通过API向被提取的模型中植入后门,具体手段是隐秘地修改查询中检测到的目标边界框(BBs),同时保持OD能力。在三个OD数据集上的实验表明,所提方法在多种实验场景下均能以100%的准确率成功识别被提取的模型。