The goal of sequential recommendation (SR) is to predict a user's potential interested items based on her/his historical interaction sequences. Most existing sequential recommenders are developed based on ID features, which, despite their widespread use, often underperform with sparse IDs and struggle with the cold-start problem. Besides, inconsistent ID mappings hinder the model's transferability, isolating similar recommendation domains that could have been co-optimized. This paper aims to address these issues by exploring the potential of multi-modal information in learning robust and generalizable sequence representations. We propose MISSRec, a multi-modal pre-training and transfer learning framework for SR. On the user side, we design a Transformer-based encoder-decoder model, where the contextual encoder learns to capture the sequence-level multi-modal user interests while a novel interest-aware decoder is developed to grasp item-modality-interest relations for better sequence representation. On the candidate item side, we adopt a dynamic fusion module to produce user-adaptive item representation, providing more precise matching between users and items. We pre-train the model with contrastive learning objectives and fine-tune it in an efficient manner. Extensive experiments demonstrate the effectiveness and flexibility of MISSRec, promising a practical solution for real-world recommendation scenarios. Data and code are available on \url{https://github.com/gimpong/MM23-MISSRec}.
翻译:序列推荐的目标是基于用户历史交互序列预测其潜在感兴趣的项目。现有大多数序列推荐模型基于ID特征构建,这类方法虽广泛应用,但面对稀疏ID时性能不佳,且难以应对冷启动问题。此外,ID映射的不一致性阻碍了模型的可迁移性,导致本可协同优化的相似推荐领域被割裂。本文旨在通过探索多模态信息在鲁棒且可泛化的序列表示学习中的潜力来解决这些问题。我们提出MISSRec,一种面向序列推荐的多模态预训练与迁移学习框架。在用户侧,我们设计基于Transformer的编码器-解码器模型:上下文编码器学习捕获序列级多模态用户兴趣,而新型兴趣感知解码器则用于捕捉项目-模态-兴趣关联以增强序列表示。在候选项目侧,我们采用动态融合模块生成用户自适应项目表示,实现用户与项目间更精准的匹配。我们通过对比学习目标预训练模型,并以高效方式微调。大量实验验证了MISSRec的有效性与灵活性,为真实推荐场景提供了实用解决方案。数据和代码已开源在\url{https://github.com/gimpong/MM23-MISSRec}。