Generative recommendation (GR) has emerged as a widely adopted paradigm in industrial sequential recommendation. Current GR systems follow a similar pipeline: tokenization for item indexing, next-token prediction as the training objective and auto-regressive decoding for next-item generation. However, existing GR research mainly focuses on architecture design and empirical performance optimization, with few rigorous theoretical explanations for the working mechanism of auto-regressive next-token prediction in recommendation scenarios. In this work, we formally prove that \textbf{the k-token auto-regressive next-token prediction (AR-NTP) paradigm is strictly mathematically equivalent to full-item-vocabulary maximum likelihood estimation (FV-MLE)}, under the core premise of a bijective mapping between items and their corresponding k-token sequences. We further show that this equivalence holds for both cascaded and parallel tokenizations, the two most widely used schemes in industrial GR systems. Our result provides the first formal theoretical foundation for the dominant industrial GR paradigm, and offers principled guidance for future GR system optimization.
翻译:生成式推荐(GR)已成为工业序列推荐中广泛采用的范式。当前GR系统遵循相似的流水线:用于条目索引的分词化、将下一令牌预测作为训练目标,以及通过自回归解码生成下一条目。然而,现有GR研究主要关注架构设计与实证性能优化,鲜有对自回归式下一令牌预测在推荐场景中工作机制的严谨理论解释。本文严格证明了:在条目与其对应的k令牌序列之间存在双射映射的核心前提下,\textbf{k令牌自回归式下一令牌预测(AR-NTP)范式与全项词汇最大似然估计(FV-MLE)在数学上严格等价}。我们进一步证明,该等价性同时适用于工业GR系统中最广泛采用的级联式与并行式两种分词方案。本研究首次为当前主流工业GR范式提供了正式的理论基础,并为未来GR系统优化提供了原则性指导。