Despite recent advancements in language and vision modeling, integrating rich multimodal knowledge into recommender systems continues to pose significant challenges. This is primarily due to the need for efficient recommendation, which requires adaptive and interactive responses. In this study, we focus on sequential recommendation and introduce a lightweight framework called full-scale Matryoshka representation learning for multimodal recommendation (fMRLRec). Our fMRLRec captures item features at different granularities, learning informative representations for efficient recommendation across multiple dimensions. To integrate item features from diverse modalities, fMRLRec employs a simple mapping to project multimodal item features into an aligned feature space. Additionally, we design an efficient linear transformation that embeds smaller features into larger ones, substantially reducing memory requirements for large-scale training on recommendation data. Combined with improved state space modeling techniques, fMRLRec scales to different dimensions and only requires one-time training to produce multiple models tailored to various granularities. We demonstrate the effectiveness and efficiency of fMRLRec on multiple benchmark datasets, which consistently achieves superior performance over state-of-the-art baseline methods. We make our code and data publicly available at https://github.com/yueqirex/fMRLRec.
翻译:尽管语言与视觉建模领域近期取得了显著进展,将丰富的多模态知识整合到推荐系统中仍然面临重大挑战。这主要源于高效推荐的需求,即需要自适应且交互式的响应。本研究聚焦于序列化推荐,提出了一种轻量级框架——面向多模态推荐的全尺度套娃表示学习(fMRLRec)。我们的fMRLRec能够捕捉不同粒度下的物品特征,学习具有信息量的表示,以支持跨多个维度的高效推荐。为整合来自不同模态的物品特征,fMRLRec采用简单的映射方法,将多模态物品特征投影到对齐的特征空间中。此外,我们设计了一种高效的线性变换,可将较小特征嵌入到较大特征中,从而显著降低在推荐数据上进行大规模训练时的内存需求。结合改进的状态空间建模技术,fMRLRec能够适应不同维度,仅需一次训练即可生成适用于多种粒度的多个模型。我们在多个基准数据集上验证了fMRLRec的有效性与效率,其性能始终优于当前最先进的基线方法。我们的代码与数据已公开于https://github.com/yueqirex/fMRLRec。