Enhancing Dyadic Relations with Homogeneous Graphs for Multimodal Recommendation

User interaction data in recommender systems is a form of dyadic relation that reflects the preferences of users with items. Learning the representations of these two discrete sets of objects, users and items, is critical for recommendation. Recent multimodal recommendation models leveraging multimodal features (e.g., images and text descriptions) have been demonstrated to be effective in improving recommendation accuracy. However, state-of-the-art models enhance the dyadic relations between users and items by considering either user-user or item-item relations, leaving the high-order relations of the other side (i.e., users or items) unexplored. Furthermore, we experimentally reveal that the current multimodality fusion methods in the state-of-the-art models may degrade their recommendation performance. That is, without tainting the model architectures, these models can achieve even better recommendation accuracy with uni-modal information. On top of the finding, we propose a model that enhances the dyadic relations by learning Dual RepresentAtions of both users and items via constructing homogeneous Graphs for multimOdal recommeNdation. We name our model as DRAGON. Specifically, DRAGON constructs the user-user graph based on the commonly interacted items and the item-item graph from item multimodal features. It then utilizes graph learning on both the user-item heterogeneous graph and the homogeneous graphs (user-user and item-item) to obtain the dual representations of users and items. To capture information from each modality, DRAGON employs a simple yet effective fusion method, attentive concatenation, to derive the representations of users and items. Extensive experiments on three public datasets and seven baselines show that DRAGON can outperform the strongest baseline by 22.03% on average. Various ablation studies are conducted on DRAGON to validate its effectiveness.

翻译：推荐系统中的用户交互数据是一种反映用户对物品偏好的二值关系。学习这两个离散对象集合（用户和物品）的表示对于推荐至关重要。近年来的多模态推荐模型利用多模态特征（如图像和文本描述）已被证明能有效提升推荐精度。然而，现有最先进的模型通过考虑用户-用户或物品-物品关系来增强用户与物品之间的二值关系，而忽略了另一侧（如用户或物品）的高阶关系。此外，我们通过实验揭示，当前最先进模型中的多模态融合方法可能降低其推荐性能。即在不改变模型架构的情况下，这些模型仅使用单模态信息即可获得更好的推荐精度。基于此发现，我们提出一种模型，通过构建同质图学习用户和物品的双重表示，以增强二值关系，用于多模态推荐。我们将该模型命名为DRAGON。具体而言，DRAGON根据共同交互的物品构建用户-用户图，并根据物品多模态特征构建物品-物品图。随后，它利用图学习技术分别在用户-物品异质图以及同质图（用户-用户和物品-物品）上学习，以获得用户和物品的双重表示。为捕获每个模态的信息，DRAGON采用一种简单而有效的融合方法——注意力拼接，以推导用户和物品的表示。在三个公开数据集和七种基线模型上的大量实验表明，DRAGON平均能比最强基线模型提升22.03%的性能。我们还对DRAGON进行了多项消融研究，以验证其有效性。