Recently, point clouds have been widely used in computer vision, whereas their collection is time-consuming and expensive. As such, point cloud datasets are the valuable intellectual property of their owners and deserve protection. To detect and prevent unauthorized use of these datasets, especially for commercial or open-sourced ones that cannot be sold again or used commercially without permission, we intend to identify whether a suspicious third-party model is trained on our protected dataset under the black-box setting. We achieve this goal by designing a scalable clean-label backdoor-based dataset watermark for point clouds that ensures both effectiveness and stealthiness. Unlike existing clean-label watermark schemes, which are susceptible to the number of categories, our method could watermark samples from all classes instead of only from the target one. Accordingly, it can still preserve high effectiveness even on large-scale datasets with many classes. Specifically, we perturb selected point clouds with non-target categories in both shape-wise and point-wise manners before inserting trigger patterns without changing their labels. The features of perturbed samples are similar to those of benign samples from the target class. As such, models trained on the watermarked dataset will have a distinctive yet stealthy backdoor behavior, i.e., misclassifying samples from the target class whenever triggers appear, since the trained DNNs will treat the inserted trigger pattern as a signal to deny predicting the target label. We also design a hypothesis-test-guided dataset ownership verification based on the proposed watermark. Extensive experiments on benchmark datasets are conducted, verifying the effectiveness of our method and its resistance to potential removal methods.
翻译:近年来,点云在计算机视觉领域得到广泛应用,但其采集过程耗时且成本高昂。因此,点云数据集是其拥有者的宝贵知识产权,值得受到保护。为检测并防止这些数据集被未经授权使用,特别是对于无法再次销售或未经许可不得商用的商业化或开源数据集,我们旨在黑盒设置下识别可疑第三方模型是否使用了我们受保护的数据集进行训练。为实现这一目标,我们设计了一种可扩展的基于干净标签后门的点云数据集水印方案,该方案在保证有效性的同时兼具隐蔽性。与现有易受类别数量影响的干净标签水印方案不同,我们的方法能够对所有类别的样本进行水印标记,而非仅针对目标类别。因此,即使在包含大量类别的大规模数据集上,该方法仍能保持较高的有效性。具体而言,我们在不改变标签的前提下,以形状级和点级两种方式对选定的非目标类别点云进行扰动,随后嵌入触发模式。扰动样本的特征与目标类别良性样本的特征相似。因此,在带水印数据集上训练的模型将表现出独特而隐蔽的后门行为:当触发模式出现时,模型会错误分类目标类别的样本,因为训练后的深度神经网络会将嵌入的触发模式视为拒绝预测目标标签的信号。基于所提出的水印方案,我们还设计了假设检验指导的数据集所有权验证方法。在基准数据集上进行的大量实验验证了我们方法的有效性及其对潜在去除方法的抵抗能力。