Sequential recommendations have drawn significant attention in modeling the user's historical behaviors to predict the next item. With the booming development of multimodal data (e.g., image, text) on internet platforms, sequential recommendation also benefits from the incorporation of multimodal data. Most methods introduce modal features of items as side information and simply concatenates them to learn unified user interests. Nevertheless, these methods encounter the limitation in modeling multimodal differences. We argue that user interests and item relationships vary across different modalities. To address this problem, we propose a novel Multimodal Difference Learning framework for Sequential Recommendation, MDSRec for brevity. Specifically, we first explore the differences in item relationships by constructing modal-aware item relation graphs with behavior signal to enhance item representations. Then, to capture the differences in user interests across modalities, we design a interest-centralized attention mechanism to independently model user sequence representations in different modalities. Finally, we fuse the user embeddings from multiple modalities to achieve accurate item recommendation. Experimental results on five real-world datasets demonstrate the superiority of MDSRec over state-of-the-art baselines and the efficacy of multimodal difference learning.
翻译:序列推荐通过建模用户历史行为来预测下一项物品,已引起广泛关注。随着互联网平台多模态数据(如图像、文本)的蓬勃发展,序列推荐也受益于多模态数据的融合。现有方法大多将物品的模态特征作为辅助信息,通过简单拼接来学习统一的用户兴趣。然而,这些方法在建模多模态差异方面存在局限。我们认为用户兴趣与物品关系在不同模态间具有差异性。为解决该问题,本文提出一种新颖的序列推荐多模态差异学习框架(简称为MDSRec)。具体而言,我们首先通过构建融合行为信号的模态感知物品关系图来增强物品表征,从而探索物品关系的模态差异。随后,为捕捉用户兴趣的跨模态差异,我们设计了兴趣中心化注意力机制,以独立建模不同模态下的用户序列表征。最后,我们融合来自多模态的用户嵌入以实现精准的物品推荐。在五个真实数据集上的实验结果表明,MDSRec优于现有先进基线方法,验证了多模态差异学习的有效性。