Generative recommendation (GR) with semantic IDs (SIDs) has emerged as a promising alternative to traditional recommendation approaches due to its performance gains, capitalization on semantic information provided through language model embeddings, and inference and storage efficiency. Existing GR with SIDs works frame the probability of a sequence of SIDs corresponding to a user's interaction history using autoregressive modeling. While this has led to impressive next item prediction performances in certain settings, these autoregressive GR with SIDs models suffer from expensive inference due to sequential token-wise decoding, potentially inefficient use of training data and bias towards learning short-context relationships among tokens. Inspired by recent breakthroughs in NLP, we propose to instead model and learn the probability of a user's sequence of SIDs using masked diffusion. Masked diffusion employs discrete masking noise to facilitate learning the sequence distribution, and models the probability of masked tokens as conditionally independent given the unmasked tokens, allowing for parallel decoding of the masked tokens. We demonstrate through thorough experiments that our proposed method consistently outperforms autoregressive modeling. This performance gap is especially pronounced in data-constrained settings and in terms of coarse-grained recall, consistent with our intuitions. Moreover, our approach allows the flexibility of predicting multiple SIDs in parallel during inference while maintaining superior performance to autoregressive modeling.
翻译:基于语义ID(SIDs)的生成式推荐(GR)因其性能提升、利用语言模型嵌入提供的语义信息以及推理和存储效率,已成为传统推荐方法的有前景的替代方案。现有的基于SIDs的GR研究通过自回归建模来构建用户交互历史对应的SID序列概率。尽管在某些场景下这带来了令人印象深刻的下一项预测性能,但这些基于SIDs的自回归GR模型存在推理成本高(由于序列化的逐令牌解码)、训练数据利用效率可能不足以及偏向于学习令牌间短上下文关系的缺陷。受自然语言处理领域近期突破的启发,我们提出使用掩码扩散来建模和学习用户SID序列的概率。掩码扩散采用离散掩码噪声来促进序列分布的学习,并将掩码令牌的概率建模为在给定未掩码令牌条件下的独立分布,从而允许对掩码令牌进行并行解码。通过全面实验,我们证明所提出的方法始终优于自回归建模。这种性能差距在数据受限的场景和粗粒度召回指标上尤为显著,这与我们的直觉一致。此外,我们的方法在推理过程中允许灵活地并行预测多个SID,同时保持优于自回归建模的性能。