With the development of multimedia applications, multimodal recommendations are playing an essential role, as they can leverage rich contexts beyond user interactions. Existing methods mainly regard multimodal information as an auxiliary, using them to help learn ID features; however, there exist semantic gaps among multimodal content features and ID features, for which directly using multimodal information as an auxiliary would lead to misalignment in representations of users and items. In this paper, we first systematically investigate the misalignment issue in multimodal recommendations, and propose a solution named AlignRec. In AlignRec, the recommendation objective is decomposed into three alignments, namely alignment within contents, alignment between content and categorical ID, and alignment between users and items. Each alignment is characterized by a specific objective function and is integrated into our multimodal recommendation framework. To effectively train our AlignRec, we propose starting from pre-training the first alignment to obtain unified multimodal features and subsequently training the following two alignments together with these features as input. As it is essential to analyze whether each multimodal feature helps in training, we design three new classes of metrics to evaluate intermediate performance. Our extensive experiments on three real-world datasets consistently verify the superiority of AlignRec compared to nine baselines. We also find that the multimodal features generated by AlignRec are better than currently used ones, which are to be open-sourced.
翻译:随着多媒体应用的发展,多模态推荐发挥着重要作用,因为它能够利用用户交互之外的丰富上下文信息。现有方法主要将多模态信息视为辅助手段,用于帮助学习ID特征;然而,多模态内容特征与ID特征之间存在语义鸿沟,直接使用多模态信息作为辅助会导致用户和物品表示的对齐偏差。本文首先系统研究了多模态推荐中的对齐偏差问题,并提出了一种名为AlignRec的解决方案。在AlignRec中,推荐目标被分解为三种对齐,即内容内部对齐、内容与类别ID之间的对齐、以及用户与物品之间的对齐。每种对齐由特定的目标函数表征,并集成到我们的多模态推荐框架中。为了有效训练AlignRec,我们提出先预训练第一种对齐以获得统一的多模态特征,然后以这些特征为输入共同训练后两种对齐。由于分析每种多模态特征是否有助于训练至关重要,我们设计了三种新的指标类别来评估中间性能。在三个真实世界数据集上的大量实验一致验证了AlignRec相比九种基线方法的优越性。我们还发现AlignRec生成的多模态特征优于当前使用的特征,这些特征将开源发布。