DiffMM: Multi-Modal Diffusion Model for Recommendation

The rise of online multi-modal sharing platforms like TikTok and YouTube has enabled personalized recommender systems to incorporate multiple modalities (such as visual, textual, and acoustic) into user representations. However, addressing the challenge of data sparsity in these systems remains a key issue. To address this limitation, recent research has introduced self-supervised learning techniques to enhance recommender systems. However, these methods often rely on simplistic random augmentation or intuitive cross-view information, which can introduce irrelevant noise and fail to accurately align the multi-modal context with user-item interaction modeling. To fill this research gap, we propose a novel multi-modal graph diffusion model for recommendation called DiffMM. Our framework integrates a modality-aware graph diffusion model with a cross-modal contrastive learning paradigm to improve modality-aware user representation learning. This integration facilitates better alignment between multi-modal feature information and collaborative relation modeling. Our approach leverages diffusion models' generative capabilities to automatically generate a user-item graph that is aware of different modalities, facilitating the incorporation of useful multi-modal knowledge in modeling user-item interactions. We conduct extensive experiments on three public datasets, consistently demonstrating the superiority of our DiffMM over various competitive baselines. For open-sourced model implementation details, you can access the source codes of our proposed framework at: https://github.com/HKUDS/DiffMM .

翻译：随着TikTok和YouTube等在线多模态共享平台的兴起，个性化推荐系统得以将多种模态（如视觉、文本和声学）融入用户表征。然而，解决这些系统中的数据稀疏性挑战仍然是一个关键问题。为应对这一局限，近期研究引入了自监督学习技术以增强推荐系统。但这些方法通常依赖于简单的随机增强或直观的跨视图信息，可能引入无关噪声且难以准确对齐多模态上下文与用户-物品交互建模。为填补这一研究空白，我们提出了一种名为DiffMM的新型多模态图扩散推荐模型。该框架将模态感知图扩散模型与跨模态对比学习范式相结合，以改进模态感知的用户表征学习。这种整合促进了多模态特征信息与协同关系建模之间更好的对齐。我们的方法利用扩散模型的生成能力，自动生成感知不同模态的用户-物品图，从而促进在用户-物品交互建模中融入有用的多模态知识。我们在三个公开数据集上进行了大量实验，结果一致证明了DiffMM相对于多种竞争基线的优越性。关于开源模型实现细节，您可通过以下链接访问我们提出框架的源代码：https://github.com/HKUDS/DiffMM。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日