BiVRec: Bidirectional View-based Multimodal Sequential Recommendation

The integration of multimodal information into sequential recommender systems has attracted significant attention in recent research. In the initial stages of multimodal sequential recommendation models, the mainstream paradigm was ID-dominant recommendations, wherein multimodal information was fused as side information. However, due to their limitations in terms of transferability and information intrusion, another paradigm emerged, wherein multimodal features were employed directly for recommendation, enabling recommendation across datasets. Nonetheless, it overlooked user ID information, resulting in low information utilization and high training costs. To this end, we propose an innovative framework, BivRec, that jointly trains the recommendation tasks in both ID and multimodal views, leveraging their synergistic relationship to enhance recommendation performance bidirectionally. To tackle the information heterogeneity issue, we first construct structured user interest representations and then learn the synergistic relationship between them. Specifically, BivRec comprises three modules: Multi-scale Interest Embedding, comprehensively modeling user interests by expanding user interaction sequences with multi-scale patching; Intra-View Interest Decomposition, constructing highly structured interest representations using carefully designed Gaussian attention and Cluster attention; and Cross-View Interest Learning, learning the synergistic relationship between the two recommendation views through coarse-grained overall semantic similarity and fine-grained interest allocation similarity BiVRec achieves state-of-the-art performance on five datasets and showcases various practical advantages.

翻译：多模态信息与序列推荐系统的集成在近期的研究中引起了广泛关注。在多模态序列推荐模型的初始阶段，主流范式是ID主导的推荐，其中多模态信息作为辅助信息进行融合。然而，由于其在可迁移性和信息侵入方面的局限性，另一种范式应运而生，即直接利用多模态特征进行推荐，从而能够跨数据集实现推荐。但这种方法忽略了用户ID信息，导致信息利用率低且训练成本高。为此，我们提出了一种创新框架BiVRec，该框架在ID视图和多模态视图下联合训练推荐任务，利用两者的协同关系双向提升推荐性能。为了应对信息异质性问题，我们首先构建结构化的用户兴趣表示，然后学习它们之间的协同关系。具体而言，BiVRec包含三个模块：多尺度兴趣嵌入，通过多尺度分块扩展用户交互序列，全面建模用户兴趣；视图内兴趣分解，利用精心设计的高斯注意力与聚类注意力构建高度结构化的兴趣表示；以及跨视图兴趣学习，通过粗粒度的整体语义相似性和细粒度的兴趣分配相似性学习两个推荐视图之间的协同关系。BiVRec在五个数据集上取得了最先进的性能，并展示了多种实际优势。