Data Watermarking for Sequential Recommender Systems

In the era of large foundation models, data has become a crucial component for building high-performance AI systems. As the demand for high-quality and large-scale data continues to rise, data copyright protection is attracting increasing attention. In this work, we explore the problem of data watermarking for sequential recommender systems, where a watermark is embedded into the target dataset and can be detected in models trained on that dataset. We address two specific challenges: dataset watermarking, which protects the ownership of the entire dataset, and user watermarking, which safeguards the data of individual users. We systematically define these problems and present a method named DWRS to address them. Our approach involves randomly selecting unpopular items to create a watermark sequence, which is then inserted into normal users' interaction sequences. Extensive experiments on five representative sequential recommendation models and three benchmark datasets demonstrate the effectiveness of DWRS in protecting data copyright while preserving model utility.

翻译：在大规模基础模型时代，数据已成为构建高性能人工智能系统的关键组成部分。随着对高质量、大规模数据需求的持续增长，数据版权保护正受到日益广泛的关注。本研究探讨了序列推荐系统中的数据水印问题，通过在目标数据集中嵌入水印，可在基于该数据集训练的模型中进行水印检测。我们针对两个具体挑战展开研究：保护完整数据集所有权的数据集水印，以及保障个体用户数据的用户水印。我们系统性地定义了这些问题，并提出名为DWRS的解决方案。该方法通过随机选择冷门物品构建水印序列，并将其插入正常用户的交互序列中。在五种代表性序列推荐模型和三个基准数据集上的大量实验表明，DWRS在保护数据版权的同时能有效保持模型性能。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日