LD4MRec: Simplifying and Powering Diffusion Model for Multimedia Recommendation

Multimedia recommendation aims to predict users' future behaviors based on historical behavioral data and item's multimodal information. However, noise inherent in behavioral data, arising from unintended user interactions with uninteresting items, detrimentally impacts recommendation performance. Recently, diffusion models have achieved high-quality information generation, in which the reverse process iteratively infers future information based on the corrupted state. It meets the need of predictive tasks under noisy conditions, and inspires exploring their application to predicting user behaviors. Nonetheless, several challenges must be addressed: 1) Classical diffusion models require excessive computation, which does not meet the efficiency requirements of recommendation systems. 2) Existing reverse processes are mainly designed for continuous data, whereas behavioral information is discrete in nature. Therefore, an effective method is needed for the generation of discrete behavioral information. To tackle the aforementioned issues, we propose a Light Diffusion model for Multimedia Recommendation. First, to reduce computational complexity, we simplify the formula of the reverse process, enabling one-step inference instead of multi-step inference. Second, to achieve effective behavioral information generation, we propose a novel Conditional neural Network. It maps the discrete behavior data into a continuous latent space, and generates behaviors with the guidance of collaborative signals and user multimodal preference. Additionally, considering that completely clean behavior data is inaccessible, we introduce a soft behavioral reconstruction constraint during model training, facilitating behavior prediction with noisy data. Empirical studies conducted on three public datasets demonstrate the effectiveness of LD4MRec.

翻译：多媒体推荐旨在基于用户历史行为数据和物品的多模态信息预测其未来行为。然而，行为数据中存在的噪声（源于用户对无兴趣物品的非预期交互）会损害推荐性能。近年来，扩散模型已实现高质量信息生成，其逆过程通过逐步修正受损状态来推断未来信息。该机制契合噪声条件下的预测任务需求，启发我们探索其在用户行为预测中的应用。但需解决以下挑战：1）经典扩散模型计算量过大，难以满足推荐系统的效率要求；2）现有逆过程主要针对连续数据设计，而行为信息本质为离散数据。因此，亟需有效方法生成离散行为信息。针对上述问题，本文提出轻量级多媒体推荐扩散模型（LD4MRec）。首先，为降低计算复杂度，我们简化逆过程公式，实现单步推理替代多步推理。其次，为有效生成行为信息，我们提出新型条件神经网络，将离散行为数据映射至连续潜在空间，并在协同信号与用户多模态偏好引导下生成行为。此外，考虑到完全清洁的行为数据难以获取，我们在模型训练中引入软行为重构约束，提升含噪数据下的行为预测能力。在三个公开数据集上的实验验证了LD4MRec的有效性。