Multimodal recommendation exploits the rich multimodal information associated with users or items to enhance the representation learning for better performance. In these methods, end-to-end feature extractors (e.g., shallow/deep neural networks) are often adopted to tailor the generic multimodal features that are extracted from raw data by pre-trained models for recommendation. However, compact extractors, such as shallow neural networks, may find it challenging to extract effective information from complex and high-dimensional generic modality features. Conversely, DNN-based extractors may encounter the data sparsity problem in recommendation. To address this problem, we propose a novel model-agnostic approach called Semantic-guided Feature Distillation (SGFD), which employs a teacher-student framework to extract feature for multimodal recommendation. The teacher model first extracts rich modality features from the generic modality feature by considering both the semantic information of items and the complementary information of multiple modalities. SGFD then utilizes response-based and feature-based distillation loss to effectively transfer the knowledge encoded in the teacher model to the student model. To evaluate the effectiveness of our SGFD, we integrate SGFD into three backbone multimodal recommendation models. Extensive experiments on three public real-world datasets demonstrate that SGFD-enhanced models can achieve substantial improvement over their counterparts.
翻译:多模态推荐利用与用户或项目相关的丰富多模态信息来增强表示学习,从而提升性能。在这些方法中,通常采用端到端特征提取器(如浅层/深层神经网络)来调整由预训练模型从原始数据中提取的通用多模态特征,以适应推荐任务。然而,诸如浅层神经网络等紧凑型提取器可能难以从复杂且高维的通用模态特征中提取有效信息。相反,基于DNN的提取器可能面临推荐中的数据稀疏问题。为解决这一问题,我们提出了一种新颖的模型无关方法,称为语义引导特征蒸馏(SGFD),该方法采用教师-学生框架为多模态推荐提取特征。教师模型首先通过综合考虑项目的语义信息及多种模态的互补信息,从通用模态特征中提取丰富的模态特征。随后,SGFD利用基于响应和基于特征的蒸馏损失,将教师模型中编码的知识有效迁移至学生模型。为评估SGFD的有效性,我们将SGFD集成到三种骨干多模态推荐模型中。在三个公开真实数据集上的广泛实验表明,SGFD增强模型相比其对应版本能实现显著提升。