We propose a novel recommender framework, MuSTRec (Multimodal and Sequential Transformer-based Recommendation), that unifies multimodal and sequential recommendation paradigms. MuSTRec captures cross-item similarities and collaborative filtering signals, by building item-item graphs from extracted text and visual features. A frequency-based self-attention module additionally captures the short- and long-term user preferences. Across multiple Amazon datasets, MuSTRec demonstrates superior performance (up to 33.5% improvement) over multimodal and sequential state-of-the-art baselines. Finally, we detail some interesting facets of this new recommendation paradigm. These include the need for a new data partitioning regime, and a demonstration of how integrating user embeddings into sequential recommendation leads to drastically increased short-term metrics (up to 200% improvement) on smaller datasets. Our code is availabe at https://anonymous.4open.science/r/MuSTRec-D32B/ and will be made publicly available.
翻译:我们提出了一种新颖的推荐框架MuSTRec(基于多模态与序列Transformer的推荐系统),该框架统一了多模态推荐与序列推荐范式。MuSTRec通过从提取的文本与视觉特征构建物品-物品关系图,捕捉跨物品相似性与协同过滤信号。一个基于频率的自注意力模块进一步捕获用户的短期与长期偏好。在多个亚马逊数据集上的实验表明,MuSTRec在多模态与序列推荐的最新基准方法上均展现出优越性能(最高提升33.5%)。最后,我们详细阐述了这一新推荐范式的若干重要特性,包括对新型数据划分机制的需求,以及论证了在序列推荐中融入用户嵌入向量可在较小数据集上显著提升短期指标(最高提升200%)。我们的代码发布于https://anonymous.4open.science/r/MuSTRec-D32B/并将公开提供。