Most recommender systems adopt collaborative filtering (CF) and provide recommendations based on past collective interactions. Therefore, the performance of CF algorithms degrades when few or no interactions are available, a scenario referred to as cold-start. To address this issue, previous work relies on models leveraging both collaborative data and side information on the users or items. Similar to multimodal learning, these models aim at combining collaborative and content representations in a shared embedding space. In this work we propose a novel technique for multimodal recommendation, relying on a multimodal Single-Branch embedding network for Recommendation (SiBraR). Leveraging weight-sharing, SiBraR encodes interaction data as well as multimodal side information using the same single-branch embedding network on different modalities. This makes SiBraR effective in scenarios of missing modality, including cold start. Our extensive experiments on large-scale recommendation datasets from three different recommendation domains (music, movie, and e-commerce) and providing multimodal content information (audio, text, image, labels, and interactions) show that SiBraR significantly outperforms CF as well as state-of-the-art content-based RSs in cold-start scenarios, and is competitive in warm scenarios. We show that SiBraR's recommendations are accurate in missing modality scenarios, and that the model is able to map different modalities to the same region of the shared embedding space, hence reducing the modality gap.
翻译:大多数推荐系统采用协同过滤(CF)方法,并基于历史集体交互行为提供推荐。因此,当可用交互数据极少或完全缺失时(即冷启动场景),CF算法的性能会显著下降。为解决此问题,先前研究依赖于同时利用协同数据与用户/物品辅助信息的模型。类似于多模态学习,这些模型旨在将协同表征与内容表征融合至共享嵌入空间。本文提出一种新颖的多模态推荐技术,基于多模态单分支嵌入推荐网络(SiBraR)。通过权重共享机制,SiBraR使用相同的单分支嵌入网络对不同模态的交互数据与多模态辅助信息进行编码。这使得SiBraR在包括冷启动在内的模态缺失场景中具有显著优势。我们在三个不同推荐领域(音乐、电影、电子商务)的大规模推荐数据集上进行了广泛实验,这些数据集提供多模态内容信息(音频、文本、图像、标签及交互记录)。实验表明,在冷启动场景下,SiBraR显著优于CF及现有最先进的基于内容的推荐系统,并在常规场景中保持竞争力。我们证明SiBraR在模态缺失场景下仍能提供准确推荐,且该模型能够将不同模态映射至共享嵌入空间的相同区域,从而有效缩小模态间隙。