Adaptive Multi-Modalities Fusion in Sequential Recommendation Systems

In sequential recommendation, multi-modal information (e.g., text or image) can provide a more comprehensive view of an item's profile. The optimal stage (early or late) to fuse modality features into item representations is still debated. We propose a graph-based approach (named MMSR) to fuse modality features in an adaptive order, enabling each modality to prioritize either its inherent sequential nature or its interplay with other modalities. MMSR represents each user's history as a graph, where the modality features of each item in a user's history sequence are denoted by cross-linked nodes. The edges between homogeneous nodes represent intra-modality sequential relationships, and the ones between heterogeneous nodes represent inter-modality interdependence relationships. During graph propagation, MMSR incorporates dual attention, differentiating homogeneous and heterogeneous neighbors. To adaptively assign nodes with distinct fusion orders, MMSR allows each node's representation to be asynchronously updated through an update gate. In scenarios where modalities exhibit stronger sequential relationships, the update gate prioritizes updates among homogeneous nodes. Conversely, when the interdependent relationships between modalities are more pronounced, the update gate prioritizes updates among heterogeneous nodes. Consequently, MMSR establishes a fusion order that spans a spectrum from early to late modality fusion. In experiments across six datasets, MMSR consistently outperforms state-of-the-art models, and our graph propagation methods surpass other graph neural networks. Additionally, MMSR naturally manages missing modalities.

翻译：在序列推荐中，多模态信息（如文本或图像）能够提供更全面的物品概况。模态特征融入物品表征的最佳阶段（早期或晚期）仍存争议。我们提出一种基于图的方法（命名为MMSR），以自适应顺序融合模态特征，使每种模态能够优先考虑其固有的序列特性或与其他模态的交互关系。MMSR将每位用户的历史记录表示为图，其中用户历史序列中每个物品的模态特征由交叉链接节点表示。同质节点之间的边代表模态内序列关系，异质节点之间的边代表模态间相互依赖关系。在图传播过程中，MMSR采用双重注意力机制，区分同质与异质邻居。为自适应地为节点分配不同的融合顺序，MMSR通过更新门允许每个节点的表示异步更新。当模态间表现出更强的序列关系时，更新门优先更新同质节点；反之，当模态间相互依赖关系更为显著时，更新门优先更新异质节点。因此，MMSR建立的融合顺序涵盖从早期到晚期模态融合的完整范围。在六个数据集上的实验中，MMSR始终优于现有最优模型，且我们的图传播方法优于其他图神经网络。此外，MMSR能自然处理缺失模态的情况。