Pareto Invariant Representation Learning for Multimedia Recommendation

Multimedia recommendation involves personalized ranking tasks, where multimedia content is usually represented using a generic encoder. However, these generic representations introduce spurious correlations that fail to reveal users' true preferences. Existing works attempt to alleviate this problem by learning invariant representations, but overlook the balance between independent and identically distributed (IID) and out-of-distribution (OOD) generalization. In this paper, we propose a framework called Pareto Invariant Representation Learning (PaInvRL) to mitigate the impact of spurious correlations from an IID-OOD multi-objective optimization perspective, by learning invariant representations (intrinsic factors that attract user attention) and variant representations (other factors) simultaneously. Specifically, PaInvRL includes three iteratively executed modules: (i) heterogeneous identification module, which identifies the heterogeneous environments to reflect distributional shifts for user-item interactions; (ii) invariant mask generation module, which learns invariant masks based on the Pareto-optimal solutions that minimize the adaptive weighted Invariant Risk Minimization (IRM) and Empirical Risk (ERM) losses; (iii) convert module, which generates both variant representations and item-invariant representations for training a multi-modal recommendation model that mitigates spurious correlations and balances the generalization performance within and cross the environmental distributions. We compare the proposed PaInvRL with state-of-the-art recommendation models on three public multimedia recommendation datasets (Movielens, Tiktok, and Kwai), and the experimental results validate the effectiveness of PaInvRL for both within- and cross-environmental learning.

翻译：多媒体推荐涉及个性化排序任务，其中多媒体内容通常使用通用编码器进行表示。然而，这些通用表示会引入虚假关联，无法揭示用户的真实偏好。现有研究试图通过学习不变表示来缓解这一问题，但忽略了独立同分布（IID）与分布外（OOD）泛化之间的平衡。本文提出了一种名为帕累托不变表示学习（PaInvRL）的框架，通过从IID-OOD多目标优化视角学习不变表示（吸引用户注意力的内在因素）与变异表示（其他因素），以减轻虚假关联的影响。具体而言，PaInvRL包含三个迭代执行的模块：(i) 异质环境识别模块，用于识别反映用户-物品交互分布偏移的异质环境；(ii) 不变掩码生成模块，基于帕累托最优解学习不变掩码，该最优解可最小化自适应加权的风险最小化（IRM）与经验风险（ERM）损失；(iii) 转换模块，生成变异表示与物品不变表示，用于训练能够缓解虚假关联并平衡环境分布内与跨环境泛化性能的多模态推荐模型。我们将所提出的PaInvRL与当前最先进的推荐模型在三个公开多媒体推荐数据集（Movielens、Tiktok、Kwai）上进行了对比，实验结果验证了PaInvRL在环境内与跨环境学习中的有效性。