PointNCBW: Towards Dataset Ownership Verification for Point Clouds via Negative Clean-label Backdoor Watermark

Recently, point clouds have been widely used in computer vision, whereas their collection is time-consuming and expensive. As such, point cloud datasets are the valuable intellectual property of their owners and deserve protection. To detect and prevent unauthorized use of these datasets, especially for commercial or open-sourced ones that cannot be sold again or used commercially without permission, we intend to identify whether a suspicious third-party model is trained on our protected dataset under the black-box setting. We achieve this goal by designing a scalable clean-label backdoor-based dataset watermark for point clouds that ensures both effectiveness and stealthiness. Unlike existing clean-label watermark schemes, which are susceptible to the number of categories, our method could watermark samples from all classes instead of only from the target one. Accordingly, it can still preserve high effectiveness even on large-scale datasets with many classes. Specifically, we perturb selected point clouds with non-target categories in both shape-wise and point-wise manners before inserting trigger patterns without changing their labels. The features of perturbed samples are similar to those of benign samples from the target class. As such, models trained on the watermarked dataset will have a distinctive yet stealthy backdoor behavior, i.e., misclassifying samples from the target class whenever triggers appear, since the trained DNNs will treat the inserted trigger pattern as a signal to deny predicting the target label. We also design a hypothesis-test-guided dataset ownership verification based on the proposed watermark. Extensive experiments on benchmark datasets are conducted, verifying the effectiveness of our method and its resistance to potential removal methods.

翻译：近年来，点云在计算机视觉领域得到广泛应用，但其采集过程耗时且成本高昂。因此，点云数据集是其拥有者的宝贵知识产权，值得受到保护。为检测并防止这些数据集被未经授权使用，特别是对于无法再次销售或未经许可不得商用的商业化或开源数据集，我们旨在黑盒设置下识别可疑第三方模型是否使用了我们受保护的数据集进行训练。为实现这一目标，我们设计了一种可扩展的基于干净标签后门的点云数据集水印方案，该方案在保证有效性的同时兼具隐蔽性。与现有易受类别数量影响的干净标签水印方案不同，我们的方法能够对所有类别的样本进行水印标记，而非仅针对目标类别。因此，即使在包含大量类别的大规模数据集上，该方法仍能保持较高的有效性。具体而言，我们在不改变标签的前提下，以形状级和点级两种方式对选定的非目标类别点云进行扰动，随后嵌入触发模式。扰动样本的特征与目标类别良性样本的特征相似。因此，在带水印数据集上训练的模型将表现出独特而隐蔽的后门行为：当触发模式出现时，模型会错误分类目标类别的样本，因为训练后的深度神经网络会将嵌入的触发模式视为拒绝预测目标标签的信号。基于所提出的水印方案，我们还设计了假设检验指导的数据集所有权验证方法。在基准数据集上进行的大量实验验证了我们方法的有效性及其对潜在去除方法的抵抗能力。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日