There is a rapidly-growing research interest in engaging users with multi-modal data for accurate user modeling on recommender systems. Existing multimedia recommenders have achieved substantial improvements by incorporating various modalities and devising delicate modules. However, when users decide to interact with items, most of them do not fully read the content of all modalities. We refer to modalities that directly cause users' behaviors as point-of-interests, which are important aspects to capture users' interests. In contrast, modalities that do not cause users' behaviors are potential noises and might mislead the learning of a recommendation model. Not surprisingly, little research in the literature has been devoted to denoising such potential noises due to the inaccessibility of users' explicit feedback on their point-of-interests. To bridge the gap, we propose a weakly-supervised framework based on contrastive learning for denoising multi-modal recommenders (dubbed Demure). In a weakly-supervised manner, Demure circumvents the requirement of users' explicit feedback and identifies the noises by analyzing the modalities of all interacted items from a given user.
翻译:现有研究正迅速聚焦于利用多模态数据提升推荐系统中用户建模的准确性。通过融合多种模态信息并设计精细模块,现有多媒体推荐器已取得显著改进。然而,用户与物品交互时,绝大多数用户并未完全阅读所有模态内容。我们将直接引发用户行为的模态称为兴趣点,这是捕捉用户兴趣的核心维度。而未能引发用户行为的模态则构成潜在噪声,可能误导推荐模型的学习过程。由于无法获取用户对兴趣点的显式反馈,现有文献鲜少关注此类潜在噪声的去噪问题。为此,我们提出基于对比学习的弱监督多模态推荐去噪框架(简称Demure)。通过弱监督机制,Demure规避了对用户显式反馈的依赖,通过分析特定用户交互物品的模态特征实现噪声识别。