In modern e-commerce, item content features in various modalities offer accurate yet comprehensive information to recommender systems. The majority of previous work either focuses on learning effective item representation during modelling user-item interactions, or exploring item-item relationships by analysing multi-modal features. Those methods, however, fail to incorporate the collaborative item-user-item relationships into the multi-modal feature-based item structure. In this work, we propose a graph-based item structure enhancement method MM-GEF: Multi-Modal recommendation with Graph Early-Fusion, which effectively combines the latent item structure underlying multi-modal contents with the collaborative signals. Instead of processing the content feature in different modalities separately, we show that the early-fusion of multi-modal features provides significant improvement. MM-GEF learns refined item representations by injecting structural information obtained from both multi-modal and collaborative signals. Through extensive experiments on four publicly available datasets, we demonstrate systematical improvements of our method over state-of-the-art multi-modal recommendation methods.
翻译:在现代电子商务中,物品的多模态内容特征为推荐系统提供了准确且全面的信息。以往的大多数工作要么专注于在建模用户-物品交互过程中学习有效的物品表示,要么通过分析多模态特征来探索物品-物品关系。然而,这些方法未能将协同的物品-用户-物品关系纳入基于多模态特征的物品结构中。在本工作中,我们提出了一种基于图的物品结构增强方法MM-GEF:采用图早期融合的多模态推荐,该方法有效地将多模态内容背后的潜在物品结构与协同信号相结合。我们证明,不同于分别处理不同模态的内容特征,多模态特征的早期融合能带来显著的性能提升。MM-GEF通过注入从多模态和协同信号中获取的结构信息,学习到精细化的物品表示。通过在四个公开数据集上的大量实验,我们展示了该方法相较于最先进的多模态推荐方法的系统性改进。