Multimodal recommendation systems can learn users' preferences from existing user-item interactions as well as the semantics of multimodal data associated with items. Many existing methods model this through a multimodal user-item graph, approaching multimodal recommendation as a graph learning task. Graph Neural Networks (GNNs) have shown promising performance in this domain. Prior research has capitalized on GNNs' capability to capture neighborhood information within certain receptive fields (typically denoted by the number of hops, $K$) to enrich user and item semantics. We observe that the optimal receptive fields for GNNs can vary across different modalities. In this paper, we propose GNNs with Modality-Independent Receptive Fields, which employ separate GNNs with independent receptive fields for different modalities to enhance performance. Our results indicate that the optimal $K$ for certain modalities on specific datasets can be as low as 1 or 2, which may restrict the GNNs' capacity to capture global information. To address this, we introduce a Sampling-based Global Transformer, which utilizes uniform global sampling to effectively integrate global information for GNNs. We conduct comprehensive experiments that demonstrate the superiority of our approach over existing methods. Our code is publicly available at https://github.com/CrawlScript/MIG-GT.
翻译:多模态推荐系统能够从现有的用户-物品交互以及物品关联的多模态数据语义中学习用户偏好。现有方法大多通过构建多模态用户-物品图,将多模态推荐建模为图学习任务。图神经网络(GNNs)在该领域已展现出优异性能。先前研究利用GNN在特定感受野(通常以跳数$K$表示)内捕获邻域信息的能力,以丰富用户和物品的语义表征。我们观察到,GNN的最佳感受野可能因不同模态而异。本文提出具有模态无关感受野的图神经网络,通过为不同模态部署具有独立感受野的分离式GNN来提升性能。实验结果表明,特定数据集中某些模态的最佳$K$值可能低至1或2,这可能限制GNN捕获全局信息的能力。为此,我们提出基于采样的全局Transformer,通过均匀全局采样机制有效整合全局信息以增强GNN。综合实验证明,该方法优于现有主流方法。代码已开源:https://github.com/CrawlScript/MIG-GT。