Multi-modal recommendation (MMR) enriches item representations by introducing item content, e.g., visual and textual descriptions, to improve upon interaction-only recommenders. The success of MMR hinges on aligning these content modalities with user preferences derived from interaction data, yet dominant practices based on disentangling modality-invariant preference-driving signals from modality-specific preference-irrelevant noises are flawed. First, they assume a one-size-fits-all relevance of item content to user preferences for all users, which contradicts the user-conditional fact of preferences. Second, they optimize pairwise contrastive losses separately toward cross-modal alignment, systematically ignoring higher-order dependencies inherent when multiple content modalities jointly influence user choices. In this paper, we introduce GTC, a conditional Generative Total Correlation learning framework. We employ an interaction-guided diffusion model to perform user-aware content feature filtering, preserving only personalized features relevant to each individual user. Furthermore, to capture complete cross-modal dependencies, we optimize a tractable lower bound of the total correlation of item representations across all modalities. Experiments on standard MMR benchmarks show GTC consistently outperforms state-of-the-art, with gains of up to 28.30% in NDCG@5. Ablation studies validate both conditional preference-driven feature filtering and total correlation optimization, confirming the ability of GTC to model user-conditional relationships in MMR tasks. The code is available at: https://github.com/jingdu-cs/GTC.
翻译:多模态推荐通过引入物品内容(如视觉与文本描述)来增强物品表示,从而改进仅基于交互的推荐模型。其成功关键在于将这些内容模态与从交互数据中推导的用户偏好对齐,然而当前基于分离模态不变偏好驱动信号与模态特定偏好无关噪声的主流方法存在缺陷。首先,它们假设物品内容与用户偏好存在适用于所有用户的统一相关性,这违背了偏好具有用户条件依赖的事实。其次,它们通过独立的成对对比损失优化跨模态对齐,系统性地忽略了多个内容模态共同影响用户选择时固有的高阶依赖关系。本文提出条件生成总相关性学习框架GTC:采用交互引导的扩散模型实现用户感知的内容特征过滤,仅保留与每个用户个体相关的个性化特征;同时,为捕获完整的跨模态依赖关系,我们优化物品表示在所有模态上总相关性的可解下界。在标准多模态推荐基准上的实验表明,GTC始终优于现有最优方法,NDCG@5指标最高提升28.30%。消融研究验证了条件偏好驱动特征过滤与总相关性优化的有效性,证实了GTC在多模态推荐任务中建模用户条件依赖关系的能力。代码已开源:https://github.com/jingdu-cs/GTC。