Multi-Resolution Diffusion for Privacy-Sensitive Recommender Systems

While recommender systems have become an integral component of the Web experience, their heavy reliance on user data raises privacy and security concerns. Substituting user data with synthetic data can address these concerns, but accurately replicating these real-world datasets has been a notoriously challenging problem. Recent advancements in generative AI have demonstrated the impressive capabilities of diffusion models in generating realistic data across various domains. In this work we introduce a Score-based Diffusion Recommendation Model (SDRM), which captures the intricate patterns of real-world datasets required for training highly accurate recommender systems. SDRM allows for the generation of synthetic data that can replace existing datasets to preserve user privacy, or augment existing datasets to address excessive data sparsity. Our method outperforms competing baselines such as generative adversarial networks, variational autoencoders, and recently proposed diffusion models in synthesizing various datasets to replace or augment the original data by an average improvement of 4.30% in Recall@$n$ and 4.65% in NDCG@$n$.

翻译：尽管推荐系统已成为网络体验的重要组成部分，但其对用户数据的重度依赖引发了隐私与安全问题。用合成数据替代用户数据可缓解这些问题，但精确复制这些真实数据集历来是一项艰巨挑战。生成式人工智能领域的最新进展展示了扩散模型在各类域中生成逼真数据的卓越能力。本文提出了一种基于分数的扩散推荐模型（SDRM），该模型能捕捉训练高精度推荐系统所需的真实数据集复杂模式。SDRM可生成合成数据以替代现有数据集来保护用户隐私，或扩充现有数据集以解决数据过度稀疏问题。在合成多种数据集以替代或扩充原始数据时，我们的方法的Recall@$n$和NDCG@$n$分别平均提升4.30%和4.65%，优于生成对抗网络、变分自编码器及近期提出的扩散模型等基线方法。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日