BiVRec: Bidirectional View-based Multimodal Sequential Recommendation

The integration of multimodal information into sequential recommender systems has attracted significant attention in recent research. In the initial stages of multimodal sequential recommendation models, the mainstream paradigm was ID-dominant recommendations, wherein multimodal information was fused as side information. However, due to their limitations in terms of transferability and information intrusion, another paradigm emerged, wherein multimodal features were employed directly for recommendation, enabling recommendation across datasets. Nonetheless, it overlooked user ID information, resulting in low information utilization and high training costs. To this end, we propose an innovative framework, BivRec, that jointly trains the recommendation tasks in both ID and multimodal views, leveraging their synergistic relationship to enhance recommendation performance bidirectionally. To tackle the information heterogeneity issue, we first construct structured user interest representations and then learn the synergistic relationship between them. Specifically, BivRec comprises three modules: Multi-scale Interest Embedding, comprehensively modeling user interests by expanding user interaction sequences with multi-scale patching; Intra-View Interest Decomposition, constructing highly structured interest representations using carefully designed Gaussian attention and Cluster attention; and Cross-View Interest Learning, learning the synergistic relationship between the two recommendation views through coarse-grained overall semantic similarity and fine-grained interest allocation similarity BiVRec achieves state-of-the-art performance on five datasets and showcases various practical advantages.

翻译：多模态信息与序列推荐系统的融合在近年研究中引起了显著关注。在多模态序列推荐模型的初始阶段，主流范式是ID主导型推荐，其中多模态信息作为辅助信息进行融合。然而，由于此类方法在可迁移性和信息侵入方面的局限性，另一种范式随之出现，即直接利用多模态特征进行推荐，从而实现跨数据集推荐。但该范式忽略了用户ID信息，导致信息利用率低且训练成本高。为此，我们提出创新框架BiVRec，该框架在ID视角和多模态视角下联合训练推荐任务，利用两者的协同关系双向提升推荐性能。为解决信息异构问题，我们首先构建结构化用户兴趣表示，进而学习其间的协同关系。具体而言，BiVRec包含三个模块：多尺度兴趣嵌入模块，通过多尺度分块扩展用户交互序列以全面建模用户兴趣；视角内兴趣分解模块，利用精心设计的高斯注意力与聚类注意力构建高度结构化的兴趣表示；以及跨视角兴趣学习模块，通过粗粒度整体语义相似性与细粒度兴趣分配相似性学习两个推荐视角间的协同关系。BiVRec在五个数据集上实现了最先进的性能，并展现了多种实际优势。