Collaborative Filtering (CF) models, despite their great success, suffer from severe performance drops due to popularity distribution shifts, where these changes are ubiquitous and inevitable in real-world scenarios. Unfortunately, most leading popularity debiasing strategies, rather than tackling the vulnerability of CF models to varying popularity distributions, require prior knowledge of the test distribution to identify the degree of bias and further learn the popularity-entangled representations to mitigate the bias. Consequently, these models result in significant performance benefits in the target test set, while dramatically deviating the recommendation from users' true interests without knowing the popularity distribution in advance. In this work, we propose a novel learning framework, Invariant Collaborative Filtering (InvCF), to discover disentangled representations that faithfully reveal the latent preference and popularity semantics without making any assumption about the popularity distribution. At its core is the distillation of unbiased preference representations (i.e., user preference on item property), which are invariant to the change of popularity semantics, while filtering out the popularity feature that is unstable or outdated. Extensive experiments on five benchmark datasets and four evaluation settings (i.e., synthetic long-tail, unbiased, temporal split, and out-of-distribution evaluations) demonstrate that InvCF outperforms the state-of-the-art baselines in terms of popularity generalization ability on real recommendations. Visualization studies shed light on the advantages of InvCF for disentangled representation learning. Our codes are available at https://github.com/anzhang314/InvCF.
翻译:协同过滤模型尽管取得了巨大成功,但由于流行度分布偏移——这一现象在现实场景中普遍存在且不可避免——其性能会严重下降。遗憾的是,大多数主流去偏策略并非直接解决协同过滤模型对多变流行度分布脆弱性的问题,而是需要预先知晓测试分布以识别偏差程度,进而学习与流行度纠缠的表示来缓解偏差。因此,这类模型在目标测试集上虽能获得显著性能提升,但在缺乏先验流行度分布信息时,会严重偏离用户真实兴趣。本文提出新颖的学习框架——不变性协同过滤(InvCF),无需对流行度分布做任何假设,即可发现能够忠实揭示潜在偏好与流行度语义的解耦表示。其核心在于蒸馏对流行度语义变化保持不变的、无偏的偏好表示(即用户对物品属性的偏好),同时滤除不稳定或过时的流行度特征。在五个基准数据集和四种评估设置(合成长尾、无偏、时间分割及分布外评估)上的大量实验表明,InvCF在实际推荐场景中的流行度泛化能力优于现有最先进基线。可视化研究进一步阐明了InvCF在解耦表示学习中的优势。我们的代码开源在https://github.com/anzhang314/InvCF。