Multimedia recommendation aims to fuse the multi-modal information of items for feature enrichment to improve the recommendation performance. However, existing methods typically introduce multi-modal information based on collaborative information to improve the overall recommendation precision, while failing to explore its cold-start recommendation performance. Meanwhile, these above methods are only applicable when such multi-modal data is available. To address this problem, this paper proposes a recommendation framework, named Cross-modal Content Inference and Feature Enrichment Recommendation (CIERec), which exploits the multi-modal information to improve its cold-start recommendation performance. Specifically, CIERec first introduces image annotation as the privileged information to help guide the mapping of unified features from the visual space to the semantic space in the training phase. And then CIERec enriches the content representation with the fusion of collaborative, visual, and cross-modal inferred representations, so as to improve its cold-start recommendation performance. Experimental results on two real-world datasets show that the content representations learned by CIERec are able to achieve superior cold-start recommendation performance over existing visually-aware recommendation algorithms. More importantly, CIERec can consistently achieve significant improvements with different conventional visually-aware backbones, which verifies its universality and effectiveness.
翻译:多媒体推荐旨在融合项目的多模态信息以增强特征表示,从而提升推荐性能。然而,现有方法通常基于协同信息引入多模态特征来提高整体推荐精度,却未能充分探索其在冷启动场景下的推荐表现。同时,这些方法仅适用于多模态数据可获取的情况。针对这一问题,本文提出了一种名为跨模态内容推理与特征增强推荐(CIERec)的推荐框架,通过利用多模态信息改善冷启动推荐性能。具体而言,CIERec首先在训练阶段引入图像标注作为特权信息,引导从视觉空间到语义空间的统一特征映射。随后,CIERec通过融合协同表示、视觉表示及跨模态推理表示来丰富内容表征,从而提升冷启动推荐效果。在两个真实数据集上的实验结果表明,CIERec学习到的内容表征能够比现有视觉感知推荐算法取得更优的冷启动推荐性能。更重要的是,CIERec在不同传统视觉感知骨干网络下均能保持显著性能提升,验证了其通用性与有效性。