The growing popularity of Deep Neural Networks, which often require computationally expensive training and access to a vast amount of data, calls for accurate authorship verification methods to deter unlawful dissemination of the models and identify the source of the leak. In DNN watermarking the owner may have access to the full network (white-box) or only be able to extract information from its output to queries (black-box), but a watermarked model may include both approaches in order to gather sufficient evidence to then gain access to the network. Although there has been limited research in white-box watermarking that considers traitor tracing, this problem is yet to be explored in the black-box scenario. In this paper, we propose a black-and-white-box watermarking method that opens the door to collusion-resistant traitor tracing in black-box, exploiting the properties of Tardos codes, and making it possible to identify the source of the leak before access to the model is granted. While experimental results show that the method can successfully identify traitors, even when further attacks have been performed, we also discuss its limitations and open problems for traitor tracing in black-box.
翻译:深度神经网络(DNN)的日益普及(其通常需要高计算成本的训练以及大量数据访问)要求精确的作者身份验证方法,以遏制模型的非法传播并识别泄露源头。在DNN水印中,所有者可能能够访问完整网络(白盒),或仅能通过查询输出提取信息(黑盒),但水印模型可同时采用这两种方法以收集足够证据,从而进一步获取网络访问权限。尽管已有少量关于白盒水印中叛逆者追踪的研究,但该问题在黑盒场景下仍有待探索。本文提出一种黑盒与白盒相结合的水印方法,利用Tardos码的特性,为黑盒场景下的抗共谋叛逆者追踪打开了大门,使得在模型访问权限授予之前即可识别泄露源头。实验结果表明,即使模型遭受进一步攻击,该方法仍能成功识别叛逆者;同时,本文也讨论了其局限性以及黑盒场景下叛逆者追踪面临的开放性问题。