Multimodal recommendation has emerged as an effective paradigm for enhancing collaborative filtering by incorporating heterogeneous content modalities. Existing multimodal recommenders predominantly focus on reinforcing cross-modal consistency to facilitate multimodal fusion. However, we observe that multimodal representations often exhibit substantial cross-modal redundancy, where dominant shared components overlap across modalities. Such redundancy can limit the effective utilization of complementary information, explaining why incorporating additional modalities does not always yield performance improvements. In this work, we propose CLEAR, a lightweight and plug-and-play cross-modal de-redundancy approach for multimodal recommendation. Rather than enforcing stronger cross-modal alignment, CLEAR explicitly characterizes the redundant shared subspace across modalities by modeling cross-modal covariance between visual and textual representations. By identifying dominant shared directions via singular value decomposition and projecting multimodal features onto the complementary null space, CLEAR reshapes the multimodal representation space by suppressing redundant cross-modal components while preserving modality-specific information. This subspace-level projection implicitly regulates representation learning dynamics, preventing the model from repeatedly amplifying redundant shared semantics during training. Notably, CLEAR can be seamlessly integrated into existing multimodal recommenders without modifying their architectures or training objectives. Extensive experiments on three public benchmark datasets demonstrate that explicitly reducing cross-modal redundancy consistently improves recommendation performance across a wide range of multimodal recommendation models.
翻译:多模态推荐通过整合异构内容模态,已成为增强协同过滤的有效范式。现有的多模态推荐模型主要侧重于强化跨模态一致性以促进多模态融合。然而,我们观察到多模态表征通常存在显著的跨模态冗余,即主导的共享成分在不同模态间重叠。这种冗余可能限制互补信息的有效利用,这也解释了为何引入额外模态并不总能带来性能提升。本研究提出CLEAR,一种轻量级即插即用的跨模态冗余消除方法,用于多模态推荐。与强制更强的跨模态对齐不同,CLEAR通过建模视觉与文本表征间的跨模态协方差,显式刻画跨模态冗余共享子空间。该方法通过奇异值分解识别主导共享方向,并将多模态特征投影到互补的零空间,从而在保留模态特定信息的同时抑制冗余跨模态成分,重塑多模态表征空间。这种子空间层面的投影隐式调节表征学习动态,防止模型在训练过程中重复放大冗余共享语义。值得注意的是,CLEAR无需修改现有多模态推荐器的架构或训练目标即可无缝集成。在三个公开基准数据集上的大量实验表明,显式降低跨模态冗余能持续提升多种多模态推荐模型的推荐性能。