In the era of large foundation models, data has become a crucial component for building high-performance AI systems. As the demand for high-quality and large-scale data continues to rise, data copyright protection is attracting increasing attention. In this work, we explore the problem of data watermarking for sequential recommender systems, where a watermark is embedded into the target dataset and can be detected in models trained on that dataset. We address two specific challenges: dataset watermarking, which protects the ownership of the entire dataset, and user watermarking, which safeguards the data of individual users. We systematically define these problems and present a method named DWRS to address them. Our approach involves randomly selecting unpopular items to create a watermark sequence, which is then inserted into normal users' interaction sequences. Extensive experiments on five representative sequential recommendation models and three benchmark datasets demonstrate the effectiveness of DWRS in protecting data copyright while preserving model utility.
翻译:在大规模基础模型时代,数据已成为构建高性能人工智能系统的关键组成部分。随着对高质量、大规模数据需求的持续增长,数据版权保护正受到日益广泛的关注。本研究探讨了序列推荐系统中的数据水印问题,通过在目标数据集中嵌入水印,可在基于该数据集训练的模型中进行水印检测。我们针对两个具体挑战展开研究:保护完整数据集所有权的数据集水印,以及保障个体用户数据的用户水印。我们系统性地定义了这些问题,并提出名为DWRS的解决方案。该方法通过随机选择冷门物品构建水印序列,并将其插入正常用户的交互序列中。在五种代表性序列推荐模型和三个基准数据集上的大量实验表明,DWRS在保护数据版权的同时能有效保持模型性能。