Cold-start item recommendation remains a persistent challenge in real-world systems due to the absence of interaction histories. While prior models attempt to bridge this gap using item content features, they universally suffer from the \textbf{seesaw dilemma}: enhancing performance for cold items inevitably degrades performance for warm items, and vice versa. We identify that this dilemma stems from a fundamental \textbf{distributional disparity}: warm item embeddings occupy a complex ``behavioral manifold" shaped by rich interaction signals, whereas cold item embeddings are constrained to a ``semantic manifold" derived solely from auxiliary content. Existing methods often force a rigid mapping between these inconsistent spaces, causing the model to sacrifice the precision of warm representations to accommodate cold ones. To address this, we propose \textbf{DiffCold}, a diffusion-based generative model that unifies warm and cold representations. Unlike GANs or VAEs, DiffCold leverages conditional diffusion to reconstruct warm item embeddings from content, preserving the underlying manifold structure without degradation. We further tailor this paradigm with two specific designs: a \textbf{Retrieval-enhanced Aggregator} that initializes generation using semantically similar warm items to bypass inefficient noise, and a \textbf{Simulation-based Representation Alignment} module that enforces distribution consistency between generated and real embeddings via contrastive learning. Experiments on three benchmarks confirm that DiffCold resolves the seesaw dilemma, consistently outperforming state-of-the-art methods across all metrics.
翻译:摘要:冷启动物品推荐因缺乏交互历史而成为现实系统中的持续性挑战。现有模型试图利用物品内容特征弥合这一差距,但普遍存在**跷跷板困境**:提升冷启动物品性能不可避免地会损害热启动物品性能,反之亦然。我们发现这一困境源于根本性的**分布差异**:热启动物品嵌入占据由丰富交互信号塑造的复杂"行为流形",而冷启动物品嵌入则局限于仅基于辅助内容的"语义流形"。现有方法常强制在这些不一致空间间建立刚性映射,导致模型为适应冷启动物品而牺牲热启动物表征的精确性。为此,我们提出**DiffCold**——一种统一冷热启动物表征的扩散生成模型。与生成对抗网络(GANs)或变分自编码器(VAEs)不同,DiffCold利用条件扩散从内容中重构热启动物品嵌入,在保持底层流形结构的同时避免性能退化。我们进一步通过两项特定设计定制该范式:**检索增强聚合器**利用语义相似热启动物品初始化生成过程,以绕过低效噪声;**基于模拟的表征对齐**模块通过对比学习强制生成嵌入与真实嵌入间的分布一致性。在三个基准上的实验证实,DiffCold解决了跷跷板困境,在所有指标上持续优于最先进方法。