With the growing privacy concerns in recommender systems, recommendation unlearning, i.e., forgetting the impact of specific learned targets, is getting increasing attention. Existing studies predominantly use training data, i.e., model inputs, as the unlearning target. However, we find that attackers can extract private information, i.e., gender, race, and age, from a trained model even if it has not been explicitly encountered during training. We name this unseen information as attribute and treat it as the unlearning target. To protect the sensitive attribute of users, Attribute Unlearning (AU) aims to degrade attacking performance and make target attributes indistinguishable. In this paper, we focus on a strict but practical setting of AU, namely Post-Training Attribute Unlearning (PoT-AU), where unlearning can only be performed after the training of the recommendation model is completed. To address the PoT-AU problem in recommender systems, we design a two-component loss function that consists of i) distinguishability loss: making attribute labels indistinguishable from attackers, and ii) regularization loss: preventing drastic changes in the model that result in a negative impact on recommendation performance. Specifically, we investigate two types of distinguishability measurements, i.e., user-to-user and distribution-to-distribution. We use the stochastic gradient descent algorithm to optimize our proposed loss. Extensive experiments on three real-world datasets demonstrate the effectiveness of our proposed methods.
翻译:随着推荐系统中隐私问题的日益突出,推荐遗忘,即遗忘特定学习目标的影响,正受到越来越多的关注。现有研究主要将训练数据(即模型输入)作为遗忘目标。然而,我们发现攻击者可以从训练后的模型中提取私有信息(如性别、种族和年龄),即使这些信息在训练过程中并未明确出现。我们将这种未见过的信息称为属性,并将其作为遗忘目标。为了保护用户的敏感属性,属性遗忘(AU)旨在降低攻击性能并使目标属性无法区分。本文聚焦于一种严格但实用的AU设定,即训练后属性遗忘(PoT-AU),其中遗忘操作只能在推荐模型训练完成后执行。为解决推荐系统中的PoT-AU问题,我们设计了一个由两部分组成的损失函数:i)区分性损失:使属性标签对攻击者无法区分;ii)正则化损失:防止模型发生剧烈变化而对推荐性能产生负面影响。具体而言,我们研究了两种区分性度量方法,即用户间区分性和分布间区分性。我们使用随机梯度下降算法来优化所提出的损失函数。在三个真实数据集上的大量实验证明了所提方法的有效性。