Curating high quality datasets that play a key role in the emergence of new AI applications requires considerable time, money, and computational resources. So, effective ownership protection of datasets is becoming critical. Recently, to protect the ownership of an image dataset, imperceptible watermarking techniques are used to store ownership information (i.e., watermark) into the individual image samples. Embedding the entire watermark into all samples leads to significant redundancy in the embedded information which damages the watermarked dataset quality and extraction accuracy. In this paper, a multi-segment encoding-decoding method for dataset watermarking (called AMUSE) is proposed to adaptively map the original watermark into a set of shorter sub-messages and vice versa. Our message encoder is an adaptive method that adjusts the length of the sub-messages according to the protection requirements for the target dataset. Existing image watermarking methods are then employed to embed the sub-messages into the original images in the dataset and also to extract them from the watermarked images. Our decoder is then used to reconstruct the original message from the extracted sub-messages. The proposed encoder and decoder are plug-and-play modules that can easily be added to any watermarking method. To this end, extensive experiments are preformed with multiple watermarking solutions which show that applying AMUSE improves the overall message extraction accuracy upto 28% for the same given dataset quality. Furthermore, the image dataset quality is enhanced by a PSNR of $\approx$2 dB on average, while improving the extraction accuracy for one of the tested image watermarking methods.
翻译:策展高质量数据集是推动新型AI应用出现的关键因素,然而这需要耗费大量时间、资金和计算资源。因此,数据集的有效所有权保护变得至关重要。最近,为保护图像数据集的所有权,研究者们采用不可察觉的水印技术,将所有权信息(即水印)嵌入到每个图像样本中。将完整水印嵌入所有样本会导致嵌入信息的显著冗余,进而损害含水印数据集的质量与提取精度。本文提出一种用于数据集水印的多段编码-解码方法(称为AMUSE),可自适应地将原始水印映射为一组较短的子消息,并反向映射。我们的消息编码器可根据目标数据集的保护需求,自适应调整子消息长度。随后采用现有图像水印方法将子消息嵌入数据集的原始图像中,并从含水印图像中提取子消息。解码器则用于从提取的子消息重构原始消息。所提出的编码器与解码器即插即用,可轻松集成至任意水印方法。基于此,我们采用多种水印方案进行了大量实验,结果表明:在保持相同数据集质量的前提下,应用AMUSE可将整体消息提取准确率提升高达28%。此外,针对所测试的一种图像水印方法,该方法在提升提取精度的同时,还将图像数据集质量平均提升了约2 dB的PSNR。