Multimodal Recommendation focuses mainly on how to effectively integrate behavior and multimodal information in the recommendation task. Previous works suffer from two major issues. Firstly, the training process tightly couples the behavior module and multimodal module by jointly optimizing them using the sharing model parameters, which leads to suboptimal performance since behavior signals and modality signals often provide opposite guidance for the parameters updates. Secondly, previous approaches fail to take into account the significant distribution differences between behavior and modality when they attempt to fuse behavior and modality information. This resulted in a misalignment between the representations of behavior and modality. To address these challenges, in this paper, we propose a novel Dual Representation learning framework for Multimodal Recommendation called DRepMRec, which introduce separate dual lines for coupling problem and Behavior-Modal Alignment (BMA) for misalignment problem. Specifically, DRepMRec leverages two independent lines of representation learning to calculate behavior and modal representations. After obtaining separate behavior and modal representations, we design a Behavior-Modal Alignment Module (BMA) to align and fuse the dual representations to solve the misalignment problem. Furthermore, we integrate the BMA into other recommendation models, resulting in consistent performance improvements. To ensure dual representations maintain their semantic independence during alignment, we introduce Similarity-Supervised Signal (SSS) for representation learning. We conduct extensive experiments on three public datasets and our method achieves state-of-the-art (SOTA) results. The source code will be available upon acceptance.
翻译:多模态推荐主要关注如何在推荐任务中有效整合行为与多模态信息。现有工作存在两大问题:其一,训练过程中通过共享模型参数联合优化行为模块与多模态模块,导致二者紧密耦合,由于行为信号与模态信号常为参数更新提供相反的指导方向,从而造成性能次优;其二,先前方法在尝试融合行为与模态信息时,未充分考虑二者显著的分布差异,导致行为表示与模态表示之间存在未对齐。为解决这些挑战,本文提出一种新颖的多模态推荐双表示学习框架DRepMRec,针对耦合问题引入独立双线结构,并针对未对齐问题设计行为-模态对齐(BMA)模块。具体而言,DRepMRec利用两条独立的表示学习路径分别计算行为表示与模态表示。在获取分离的行为与模态表示后,我们设计行为-模态对齐模块(BMA)来对齐并融合双表示以解决未对齐问题。此外,我们将BMA集成到其他推荐模型中,取得了持续的性能提升。为确保对齐过程中双表示保持语义独立性,我们引入相似性监督信号(SSS)用于表示学习。在三个公开数据集上的大量实验表明,本方法达到当前最优(SOTA)水平。源代码将在论文接收后公开。