Collaborative Filtering (CF) models, despite their great success, suffer from severe performance drops due to popularity distribution shifts, where these changes are ubiquitous and inevitable in real-world scenarios. Unfortunately, most leading popularity debiasing strategies, rather than tackling the vulnerability of CF models to varying popularity distributions, require prior knowledge of the test distribution to identify the degree of bias and further learn the popularity-entangled representations to mitigate the bias. Consequently, these models result in significant performance benefits in the target test set, while dramatically deviating the recommendation from users' true interests without knowing the popularity distribution in advance. In this work, we propose a novel learning framework, Invariant Collaborative Filtering (InvCF), to discover disentangled representations that faithfully reveal the latent preference and popularity semantics without making any assumption about the popularity distribution. At its core is the distillation of unbiased preference representations (i.e., user preference on item property), which are invariant to the change of popularity semantics, while filtering out the popularity feature that is unstable or outdated. Extensive experiments on five benchmark datasets and four evaluation settings (i.e., synthetic long-tail, unbiased, temporal split, and out-of-distribution evaluations) demonstrate that InvCF outperforms the state-of-the-art baselines in terms of popularity generalization ability on real recommendations. Visualization studies shed light on the advantages of InvCF for disentangled representation learning. Our codes are available at https://github.com/anzhang314/InvCF.
翻译:协同过滤(CF)模型尽管取得了巨大成功,但由于流行度分布偏移(这些变化在现实场景中普遍存在且不可避免)而遭受严重的性能下降。遗憾的是,大多数主流的去流行度偏差策略并非直接应对CF模型在不同流行度分布下的脆弱性,而是需要测试分布的先验知识来识别偏差程度,并进一步学习与流行度纠缠的表示以缓解偏差。因此,这些模型在目标测试集上表现出显著的性能提升,但在未知流行度分布的情况下,推荐结果会严重偏离用户的真实兴趣。在本文中,我们提出了一种新颖的学习框架——不变协同过滤(InvCF),旨在无需对流行度分布做任何假设的情况下,发现能够忠实揭示潜在偏好和流行度语义的解耦表示。其核心是提炼对流行度语义变化保持不变的、无偏的偏好表示(即用户对物品属性的偏好),同时过滤掉不稳定或过时的流行度特征。在五个基准数据集和四种评估设置(即合成长尾、无偏、时间分割和分布外评估)上的大量实验表明,InvCF在真实推荐场景的流行度泛化能力上优于当前最先进的基线方法。可视化研究揭示了InvCF在解耦表示学习中的优势。我们的代码可在 https://github.com/anzhang314/InvCF 获取。