Towards Trustworthy Multimodal Recommendation

Recent advances in multimodal recommendation have demonstrated the effectiveness of incorporating visual and textual content into collaborative filtering. However, real-world deployments raise an increasingly important yet underexplored issue: trustworthiness. On modern e-commerce platforms, multimodal content can be misleading or unreliable (e.g., visually inconsistent product images or click-bait titles), injecting untrustworthy signals into multimodal representations and making existing recommenders brittle under modality corruption. In this work, we take a step towards trustworthy multimodal recommendation from both a method and an analysis perspective. First, we propose a plug-and-play modality-level rectification component that mitigates untrustworthy modality features by learning soft correspondences between items and multimodal features. Using lightweight projections and Sinkhorn-based soft matching, the rectification suppresses mismatched modality signals while preserving semantic consistency, and can be integrated into existing multimodal recommenders without architectural modifications. Second, we present two practical insights on interaction-level trustworthiness under noisy collaborative signals: (i) training-set pseudo interactions can help or hurt performance under noise depending on prior-signal alignment; and (ii) propagation-graph pseudo edges can also help or hurt robustness, as message passing may amplify misalignment. Extensive experiments on multiple datasets and backbones under varying corruption levels demonstrate improved robustness from modality rectification and validate the above interaction-level observations.

翻译：近年来，多模态推荐领域的研究进展表明，将视觉与文本内容融入协同过滤能有效提升推荐性能。然而，实际部署中一个日益重要却尚未被充分探讨的问题逐渐凸显：可信性。在现代电子商务平台上，多模态内容可能具有误导性或不可靠性（例如视觉不一致的商品图片或点击诱饵式标题），这些不可信信号会注入多模态表征中，导致现有推荐系统在模态受损时表现脆弱。本研究从方法与分析两个角度出发，向可信多模态推荐迈进一步。首先，我们提出一种即插即用的模态级校正组件，通过学习商品与多模态特征间的软对应关系来缓解不可信的模态特征。该校正模块通过轻量级投影与基于Sinkhorn的软匹配技术，在保持语义一致性的同时抑制不匹配的模态信号，且无需修改现有多模态推荐器的架构即可直接集成。其次，我们针对噪声协同信号下的交互级可信性提出两项实践洞见：（一）训练集伪交互在噪声环境下可能提升或损害性能，其效果取决于先验信号的对齐程度；（二）传播图伪边同样可能增强或削弱鲁棒性，因为消息传递可能放大未对齐效应。在不同污染等级下对多个数据集与骨干模型进行的广泛实验表明，模态校正能显著提升系统鲁棒性，并验证了上述交互层面的观察结论。