Sequential Recommendation (SR) models infer user preferences from interaction histories. While transferable Multi-modal SR models outperform traditional ID-based approaches, existing methods struggle with slow fine-tuning convergence due to complex optimization requirements and negative transfer effects. We propose MMM4Rec (Multi-Modal Mamba for Sequential Recommendation), a novel Multi-modal SR framework that incorporates a dedicated algebraic constraint mechanism for efficient transfer learning. By combining State Space Duality (SSD)'s temporal decay properties with a globally-aware temporal modeling design, our model dynamically prioritizes key modality information, overcoming limitations of Transformer-based approaches. The framework implements a constrained two-stage process: (1) sequence-level cross-modal alignment via shared projection matrices, followed by (2) temporal fusion using our newly designed Cross-SSD module and dual-channel Fourier adaptive filtering. This architecture maintains semantic consistency while suppressing noise propagation. MMM4Rec achieves rapid fine-tuning convergence with simple cross-entropy loss, significantly improving Multi-modal recommendation accuracy while maintaining strong transferability. Extensive experiments demonstrate MMM4Rec's state-of-the-art performance, achieving strong multi-modal retrieval capability and exhibiting 10x faster average convergence speed when transferring to large-scale downstream datasets. The implementation is available at https://github.com/AlwaysFHao/MMM4Rec .
翻译:序列推荐(SR)模型从交互历史中推断用户偏好。尽管可迁移的多模态SR模型优于传统的基于ID的方法,但现有方法因复杂的优化需求和负迁移效应导致微调收敛缓慢。我们提出MMM4Rec(多模态Mamba序列推荐),一种新颖的多模态SR框架,其集成了专用代数约束机制以实现高效迁移学习。通过将状态空间对偶(SSD)的时间衰减特性与全局感知的时间建模设计相结合,我们的模型能够动态优先处理关键模态信息,克服了基于Transformer的方法的局限性。该框架实现了一个受约束的两阶段过程:(1)通过共享投影矩阵进行序列级跨模态对齐;(2)使用我们新设计的Cross-SSD模块和双通道傅里叶自适应滤波进行时间融合。该架构在抑制噪声传播的同时保持了语义一致性。MMM4Rec通过简单的交叉熵损失实现快速微调收敛,在多模态推荐准确率显著提升的同时保持了强迁移性。大量实验证明了MMM4Rec的最优性能,其在迁移到大规模下游数据集时展现出强大的多模态检索能力,平均收敛速度提升10倍。实现代码已开源:https://github.com/AlwaysFHao/MMM4Rec