Pareto Invariant Representation Learning for Multimedia Recommendation

Multimedia recommendation involves personalized ranking tasks, where multimedia content is usually represented using a generic encoder. However, these generic representations introduce spurious correlations that fail to reveal users' true preferences. Existing works attempt to alleviate this problem by learning invariant representations, but overlook the balance between independent and identically distributed (IID) and out-of-distribution (OOD) generalization. In this paper, we propose a framework called Pareto Invariant Representation Learning (PaInvRL) to mitigate the impact of spurious correlations from an IID-OOD multi-objective optimization perspective, by learning invariant representations (intrinsic factors that attract user attention) and variant representations (other factors) simultaneously. Specifically, PaInvRL includes three iteratively executed modules: (i) heterogeneous identification module, which identifies the heterogeneous environments to reflect distributional shifts for user-item interactions; (ii) invariant mask generation module, which learns invariant masks based on the Pareto-optimal solutions that minimize the adaptive weighted Invariant Risk Minimization (IRM) and Empirical Risk (ERM) losses; (iii) convert module, which generates both variant representations and item-invariant representations for training a multi-modal recommendation model that mitigates spurious correlations and balances the generalization performance within and cross the environmental distributions. We compare the proposed PaInvRL with state-of-the-art recommendation models on three public multimedia recommendation datasets (Movielens, Tiktok, and Kwai), and the experimental results validate the effectiveness of PaInvRL for both within- and cross-environmental learning.

翻译：多媒体推荐涉及个性化排序任务，其中多媒体内容通常使用通用编码器进行表示。然而，这些通用表示引入虚假相关性，难以揭示用户的真实偏好。现有研究尝试通过学习不变表示来缓解该问题，但忽视了独立同分布（IID）与分布外（OOD）泛化之间的平衡。本文提出一种名为帕累托不变表示学习（PaInvRL）的框架，从IID-OOD多目标优化视角减轻虚假相关性的影响，通过同时学习不变表示（吸引用户注意力的内在因素）与变体表示（其他因素）。具体而言，PaInvRL包含三个迭代执行的模块：（i）异构环境识别模块，识别异构环境以反映用户-物品交互的分布偏移；（ii）不变掩码生成模块，基于帕累托最优解学习不变掩码，最小化自适应加权的不变风险最小化（IRM）与经验风险（ERM）损失；（iii）转换模块，生成变体表示与物品不变表示，用于训练多模态推荐模型，以减轻虚假相关性并平衡环境分布内与跨环境分布的泛化性能。我们将所提PaInvRL与多个最先进的推荐模型在三个公开多媒体推荐数据集（Movielens、Tiktok和Kwai）上进行对比，实验结果验证了PaInvRL在环境内与跨环境学习中的有效性。