Generative Recommendation (GR) has excelled by framing recommendation as next-token prediction. This paradigm relies on Semantic IDs (SIDs) to tokenize large-scale items into discrete sequences. Existing GR approaches predominantly generate SIDs via Residual Quantization (RQ), where items are encoded into embeddings and then quantized to discrete SIDs. However, this paradigm suffers from inherent limitations: 1) Objective misalignment and semantic degradation stemming from the two-stage compression; 2) Error accumulation inherent in the structure of RQ. To address these limitations, we propose UniSID, a Unified SID generation framework for generative advertisement recommendation. Specifically, we jointly optimize embeddings and SIDs in an end-to-end manner from raw advertising data, enabling semantic information to flow directly into the SID space and thus addressing the inherent limitations of the two-stage cascading compression paradigm. To capture fine-grained semantics, a multi-granularity contrastive learning strategy is introduced to align distinct items across SID levels. Finally, a summary-based ad reconstruction mechanism is proposed to encourage SIDs to capture high-level semantic information that is not explicitly present in advertising contexts. Experiments demonstrate that UniSID consistently outperforms state-of-the-art SID generation methods, yielding up to a 4.62% improvement in Hit Rate metrics across downstream advertising scenarios compared to the strongest baseline.
翻译:生成式推荐(GR)通过将推荐任务构建为下一个令牌预测问题而表现出色。该范式依赖于语义ID(SID)将大规模物品标记化为离散序列。现有的GR方法主要通过残差量化(RQ)生成SID,即将物品编码为嵌入向量,然后量化成离散的SID。然而,该范式存在固有局限:1)由两阶段压缩导致的目标错位与语义退化;2)RQ结构固有的误差累积。为应对这些局限,我们提出UniSID,一个面向生成式广告推荐的统一SID生成框架。具体而言,我们从原始广告数据出发,以端到端方式联合优化嵌入向量与SID,使语义信息能够直接流入SID空间,从而解决两阶段级联压缩范式的固有缺陷。为捕捉细粒度语义,我们引入了一种多粒度对比学习策略,以对齐不同SID层级上的物品。最后,我们提出了一种基于摘要的广告重构机制,以促使SID捕获广告上下文中未明确呈现的高层语义信息。实验表明,UniSID在各项下游广告场景中持续优于最先进的SID生成方法,与最强基线相比,在命中率指标上最高可获得4.62%的提升。