We consider the problem of training private recommendation models with access to public item features. Training with Differential Privacy (DP) offers strong privacy guarantees, at the expense of loss in recommendation quality. We show that incorporating public item features during training can help mitigate this loss in quality. We propose a general approach based on collective matrix factorization (CMF), that works by simultaneously factorizing two matrices: the user feedback matrix (representing sensitive data) and an item feature matrix that encodes publicly available (non-sensitive) item information. The method is conceptually simple, easy to tune, and highly scalable. It can be applied to different types of public item data, including: (1) categorical item features; (2) item-item similarities learned from public sources; and (3) publicly available user feedback. Furthermore, these data modalities can be collectively utilized to fully leverage public data. Evaluating our method on a standard DP recommendation benchmark, we find that using public item features significantly narrows the quality gap between private models and their non-private counterparts. As privacy constraints become more stringent, models rely more heavily on public side features for recommendation. This results in a smooth transition from collaborative filtering to item-based contextual recommendations.
翻译:我们研究了在利用公共物品特征的情况下训练隐私保护推荐模型的问题。采用差分隐私(DP)进行训练在保证强隐私性的同时会导致推荐质量下降。我们证明,在训练过程中融入公共物品特征有助于缓解这种质量损失。基于联合矩阵分解(CMF)方法,我们提出了一种通用方案,该方法通过同时分解两个矩阵来实现:用户反馈矩阵(代表敏感数据)和物品特征矩阵(编码公开可用的非敏感物品信息)。该方法概念简单、易于调优且高度可扩展,可应用于不同类型的公共物品数据,包括:(1)分类物品特征;(2)从公共来源学习的物品间相似性;以及(3)公开可用的用户反馈。此外,这些数据模态可以协同利用,以充分挖掘公共数据价值。在标准差分隐私推荐基准上的评估表明,使用公共物品特征显著缩小了隐私模型与非隐私模型之间的质量差距。随着隐私约束的日益严格,模型在推荐中更依赖于公共侧特征,从而实现从协同过滤到基于物品的上下文推荐的无缝过渡。