A Tale of Two Graphs: Freezing and Denoising Graph Structures for Multimodal Recommendation

Multimodal recommender systems utilizing multimodal features (e.g., images and textual descriptions) typically show better recommendation accuracy than general recommendation models based solely on user-item interactions. Generally, prior work fuses multimodal features into item ID embeddings to enrich item representations, thus failing to capture the latent semantic item-item structures. In this context, LATTICE proposes to learn the latent structure between items explicitly and achieves state-of-the-art performance for multimodal recommendations. However, we argue the latent graph structure learning of LATTICE is both inefficient and unnecessary. Experimentally, we demonstrate that freezing its item-item structure before training can also achieve competitive performance. Based on this finding, we propose a simple yet effective model, dubbed as FREEDOM, that FREEzes the item-item graph and DenOises the user-item interaction graph simultaneously for Multimodal recommendation. Theoretically, we examine the design of FREEDOM through a graph spectral perspective and demonstrate that it possesses a tighter upper bound on the graph spectrum. In denoising the user-item interaction graph, we devise a degree-sensitive edge pruning method, which rejects possibly noisy edges with a high probability when sampling the graph. We evaluate the proposed model on three real-world datasets and show that FREEDOM can significantly outperform current strongest baselines. Compared with LATTICE, FREEDOM achieves an average improvement of 19.07% in recommendation accuracy while reducing its memory cost up to 6$\times$ on large graphs. The source code is available at: https://github.com/enoche/FREEDOM.

翻译：利用多模态特征（如图像和文本描述）的多模态推荐系统，通常比仅基于用户-物品交互的普通推荐模型展现出更高的推荐准确率。现有工作一般将多模态特征融合到物品ID嵌入中，以丰富物品表示，但因此未能捕捉潜在的语义物品-物品结构。在此背景下，LATTICE提出了显式建模物品间潜在结构的方法，并达到了多模态推荐的最优性能。然而，我们认为LATTICE的潜在图结构学习既低效且不必要。实验证明，在训练前冻结其物品-物品结构也能获得具有竞争力的性能。基于这一发现，我们提出了一种简单而有效的模型FREEDOM，该模型同时冻结物品-物品图并对用户-物品交互图进行去噪，以应用于多模态推荐。理论上，我们从图谱角度审视了FREEDOM的设计，并证明它在图谱上具有更紧的上界。在对用户-物品交互图去噪时，我们设计了一种度数敏感的边剪枝方法，该方法在采样图时以高概率拒绝可能含噪声的边。我们在三个真实数据集上评估了所提模型，结果表明FREEDOM能够显著超越当前最强的基线模型。与LATTICE相比，FREEDOM在推荐准确率上平均提升19.07%，同时在大图上将内存开销降低至1/6。源代码见：https://github.com/enoche/FREEDOM。