Multimodal recommender systems utilizing multimodal features (e.g., images and textual descriptions) typically show better recommendation accuracy than general recommendation models based solely on user-item interactions. Generally, prior work fuses multimodal features into item ID embeddings to enrich item representations, thus failing to capture the latent semantic item-item structures. In this context, LATTICE proposes to learn the latent structure between items explicitly and achieves state-of-the-art performance for multimodal recommendations. However, we argue the latent graph structure learning of LATTICE is both inefficient and unnecessary. Experimentally, we demonstrate that freezing its item-item structure before training can also achieve competitive performance. Based on this finding, we propose a simple yet effective model, dubbed as FREEDOM, that FREEzes the item-item graph and DenOises the user-item interaction graph simultaneously for Multimodal recommendation. Theoretically, we examine the design of FREEDOM through a graph spectral perspective and demonstrate that it possesses a tighter upper bound on the graph spectrum. In denoising the user-item interaction graph, we devise a degree-sensitive edge pruning method, which rejects possibly noisy edges with a high probability when sampling the graph. We evaluate the proposed model on three real-world datasets and show that FREEDOM can significantly outperform current strongest baselines. Compared with LATTICE, FREEDOM achieves an average improvement of 19.07% in recommendation accuracy while reducing its memory cost up to 6$\times$ on large graphs. The source code is available at: https://github.com/enoche/FREEDOM.
翻译:利用多模态特征(如图像和文本描述)的多模态推荐系统,通常比仅基于用户-物品交互的普通推荐模型展现出更高的推荐准确率。现有工作一般将多模态特征融合到物品ID嵌入中,以丰富物品表示,但因此未能捕捉潜在的语义物品-物品结构。在此背景下,LATTICE提出了显式建模物品间潜在结构的方法,并达到了多模态推荐的最优性能。然而,我们认为LATTICE的潜在图结构学习既低效且不必要。实验证明,在训练前冻结其物品-物品结构也能获得具有竞争力的性能。基于这一发现,我们提出了一种简单而有效的模型FREEDOM,该模型同时冻结物品-物品图并对用户-物品交互图进行去噪,以应用于多模态推荐。理论上,我们从图谱角度审视了FREEDOM的设计,并证明它在图谱上具有更紧的上界。在对用户-物品交互图去噪时,我们设计了一种度数敏感的边剪枝方法,该方法在采样图时以高概率拒绝可能含噪声的边。我们在三个真实数据集上评估了所提模型,结果表明FREEDOM能够显著超越当前最强的基线模型。与LATTICE相比,FREEDOM在推荐准确率上平均提升19.07%,同时在大图上将内存开销降低至1/6。源代码见:https://github.com/enoche/FREEDOM。