Generative Recommendation (GR) reframes retrieval and ranking as autoregressive decoding over Semantic IDs (SIDs), unifying the multi-stage pipeline into a single model. Yet a fundamental expressive gap persists: discriminative models score items with direct feature access enabling explicit user-item crossing, whereas GR decodes over compact SID tokens without item-side signal. We formalize this via Bayes' theorem: ranking by p(y|f,u) is equivalent to ranking by p(f|y,u), which factorizes autoregressively over item features, showing that a generative model with full feature access matches its discriminative counterpart, with any practical gap stemming solely from incomplete feature coverage. We propose UniRec with Chain-of-Attribute (CoA) as its core mechanism. CoA prefixes each SID sequence with structured attribute tokens:category, seller, brand, before decoding the SID, recovering the item-side feature crossing that discriminative models exploit. Since items sharing identical attributes cluster in adjacent SID regions, attribute conditioning yields a measurable per-step entropy reduction H(s_k|s<k,a) < H(s_k|s<k), narrowing the search space and stabilizing beam search. We further address two deployment challenges: Capacity-constrained SID introduces exposure-weighted capacity penalties into residual quantization to suppress token collapse and the Matthew effect; Conditional Decoding Context (CDC) combines Task-Conditioned BOS with hash-based Content Summaries to inject scenario signals at each decoding step. A joint RFT and DPO framework aligns the model with business objectives beyond distribution matching. Experiments show UniRec outperforms the strongest baseline by +22.6% HR@50 overall and +15.5% on high-value orders. Deployed on Shopee's e-commerce platform, online A/B tests confirm significant gains in PVCTR (+5.37%), orders (+4.76%), and GMV (+5.60%).
翻译:生成式推荐(GR)将检索和排序重新定义为基于语义ID(SID)的自回归解码,从而将多阶段流水线整合为单一模型。然而,一个根本性的表达鸿沟依然存在:判别式模型通过直接的特征访问对项目进行评分,从而能够实现显式的用户-项目交叉,而生成式推荐则在紧凑的SID令牌上进行解码,缺乏项目侧信号。我们通过贝叶斯定理对其进行了形式化:根据p(y|f,u)排序等价于根据p(f|y,u)排序,后者在项目特征上自回归分解,表明具有完整特征访问的生成式模型在性能上与其判别式对应模型相当,任何实际的差距仅源于特征覆盖的不完整性。我们提出了UniRec,其核心机制是属性链(CoA)。CoA在解码SID之前,为每个SID序列添加结构化的属性令牌作为前缀:类别、卖家、品牌,从而恢复了判别式模型所利用的项目侧特征交叉。由于共享相同属性的项目在相邻的SID区域中聚类,属性条件作用产生了可测量的每步熵降低H(s_k|s<k, a) < H(s_k|s<k),从而缩小了搜索空间并稳定了束搜索。我们进一步解决了两个部署挑战:容量受限的SID在残差量化中引入了曝光加权容量惩罚,以抑制令牌坍缩和马太效应;条件解码上下文(CDC)结合了任务条件化的BOS与基于哈希的内容摘要,在每个解码步骤中注入场景信号。联合RFT和DPO框架使模型与超越分布匹配的业务目标保持一致。实验表明,UniRec在总体HR@50指标上比最强基线提升了+22.6%,在高价值订单上提升了+15.5%。在Shopee电商平台部署后,在线A/B测试证实了PVCTR(+5.37%)、订单量(+4.76%)和GMV(+5.60%)的显著提升。