The growing popularity of Deep Neural Networks, which often require computationally expensive training and access to a vast amount of data, calls for accurate authorship verification methods to deter unlawful dissemination of the models and identify the source of the leak. In DNN watermarking the owner may have access to the full network (white-box) or only be able to extract information from its output to queries (black-box), but a watermarked model may include both approaches in order to gather sufficient evidence to then gain access to the network. Although there has been limited research in white-box watermarking that considers traitor tracing, this problem is yet to be explored in the black-box scenario. In this paper, we propose a black-and-white-box watermarking method that opens the door to collusion-resistant traitor tracing in black-box, exploiting the properties of Tardos codes, and making it possible to identify the source of the leak before access to the model is granted. While experimental results show that the method can successfully identify traitors, even when further attacks have been performed, we also discuss its limitations and open problems for traitor tracing in black-box.
翻译:深度神经网络的日益普及(其通常需要高计算成本的训练和海量数据访问)迫切要求精确的作者身份验证方法,以遏制模型的非法传播并定位泄露源头。在DNN水印技术中,模型所有者可能拥有完整网络权限(白盒场景),或仅能从查询响应中提取输出信息(黑盒场景)。但水印化模型可能同时采用这两种方法,以收集充分证据后获得网络访问权限。尽管已有少量白盒水印研究涉及叛徒追踪问题,但在黑盒场景下该问题仍待探索。本文提出一种黑盒与白盒融合的水印方法,通过利用Tardos码的特性,为黑盒场景下的抗合谋叛徒追踪开辟新路径,使得在获得模型访问权限前即可识别泄露源头。实验结果表明,该方法即便在遭受进一步攻击的情况下仍能成功识别叛徒。同时,我们讨论了该方法的局限性以及黑盒叛徒追踪中存在的开放性问题。