基于掩码扩散的生成式推荐 (Masked Diffusion for Generative Recommendation)

Generative recommendation (GR) with semantic IDs (SIDs) has emerged as a promising alternative to traditional recommendation approaches due to its performance gains, capitalization on semantic information provided through language model embeddings, and inference and storage efficiency. Existing GR with SIDs works frame the probability of a sequence of SIDs corresponding to a user's interaction history using autoregressive modeling. While this has led to impressive next item prediction performances in certain settings, these autoregressive GR with SIDs models suffer from expensive inference due to sequential token-wise decoding, potentially inefficient use of training data and bias towards learning short-context relationships among tokens. Inspired by recent breakthroughs in NLP, we propose to instead model and learn the probability of a user's sequence of SIDs using masked diffusion. Masked diffusion employs discrete masking noise to facilitate learning the sequence distribution, and models the probability of masked tokens as conditionally independent given the unmasked tokens, allowing for parallel decoding of the masked tokens. We demonstrate through thorough experiments that our proposed method consistently outperforms autoregressive modeling. This performance gap is especially pronounced in data-constrained settings and in terms of coarse-grained recall, consistent with our intuitions. Moreover, our approach allows the flexibility of predicting multiple SIDs in parallel during inference while maintaining superior performance to autoregressive modeling.

翻译：基于语义ID（SIDs）的生成式推荐（GR）因其性能提升、利用语言模型嵌入提供的语义信息以及推理和存储效率高等优势，已成为传统推荐方法的有前景替代方案。现有的基于SIDs的GR研究通过自回归建模来构建用户交互历史对应的SID序列概率。尽管在某些场景下这种方法在下一项预测方面取得了显著效果，但这些基于SIDs的自回归GR模型存在推理成本高昂（源于序列化的逐令牌解码）、训练数据利用效率可能不足以及倾向于学习令牌间短上下文关系的偏差。受自然语言处理领域近期突破的启发，我们提出采用掩码扩散来建模和学习用户SID序列的概率。掩码扩散利用离散掩码噪声促进序列分布的学习，并将掩码令牌的概率建模为在给定未掩码令牌条件下相互独立，从而实现掩码令牌的并行解码。通过全面实验，我们证明所提出的方法持续优于自回归建模。这种性能差距在数据受限场景和粗粒度召回指标上尤为明显，这与我们的理论预期一致。此外，我们的方法在推理过程中支持并行预测多个SID，同时保持优于自回归建模的性能。