Spectrum-based Modality Representation Fusion Graph Convolutional Network for Multimodal Recommendation

Incorporating multi-modal features as side information has recently become a trend in recommender systems. To elucidate user-item preferences, recent studies focus on fusing modalities via concatenation, element-wise sum, or attention mechanisms. Despite having notable success, existing approaches do not account for the modality-specific noise encapsulated within each modality. As a result, direct fusion of modalities will lead to the amplification of cross-modality noise. Moreover, the variation of noise that is unique within each modality results in noise alleviation and fusion being more challenging. In this work, we propose a new Spectrum-based Modality Representation (SMORE) fusion graph recommender that aims to capture both uni-modal and fusion preferences while simultaneously suppressing modality noise. Specifically, SMORE projects the multi-modal features into the frequency domain and leverages the spectral space for fusion. To reduce dynamic contamination that is unique to each modality, we introduce a filter to attenuate and suppress the modality noise adaptively while capturing the universal modality patterns effectively. Furthermore, we explore the item latent structures by designing a new multi-modal graph learning module to capture associative semantic correlations and universal fusion patterns among similar items. Finally, we formulate a new modality-aware preference module, which infuses behavioral features and balances the uni- and multi-modal features for precise preference modeling. This empowers SMORE with the ability to infer both user modality-specific and fusion preferences more accurately. Experiments on three real-world datasets show the efficacy of our proposed model. The source code for this work has been made publicly available at https://github.com/kennethorq/SMORE.

翻译：近年来，将多模态特征作为辅助信息融入推荐系统已成为趋势。为阐明用户-物品偏好，近期研究集中于通过拼接、逐元素求和或注意力机制进行模态融合。尽管取得了显著成功，现有方法未能考虑每个模态内部封装的模态特定噪声。因此，直接融合模态将导致跨模态噪声的放大。此外，每个模态内部独特的噪声变化使得噪声缓解与融合更具挑战性。本研究提出一种新的基于频谱的模态表示融合图推荐模型，旨在捕获单模态与融合偏好的同时抑制模态噪声。具体而言，SMORE将多模态特征投影至频域，并利用谱空间进行融合。为减少每个模态特有的动态污染，我们引入自适应滤波器以衰减和抑制模态噪声，同时有效捕获通用模态模式。此外，我们通过设计新的多模态图学习模块探索物品潜在结构，以捕获相似物品间的关联语义相关性与通用融合模式。最后，我们构建了新的模态感知偏好模块，该模块融合行为特征并平衡单模态与多模态特征，以实现精确的偏好建模。这使得SMORE能够更准确地推断用户模态特定偏好与融合偏好。在三个真实数据集上的实验验证了所提模型的有效性。本工作的源代码已公开于https://github.com/kennethorq/SMORE。