Generative Recommendation (GR) has gained traction for its merits of superior performance and cold-start capability. As the vital role in GR, Semantic Identifiers (SIDs) represent item semantics through discrete tokens. However, current techniques for SID generation based on vector quantization face two main challenges: (i) training instability, stemming from insufficient gradient propagation through the straight-through estimator and sensitivity to initialization; and (ii) inefficient SID quality assessment, where industrial practice still depends on costly GR training and A/B testing. To address these challenges, we propose Reference Vector-Guided Rating Residual Quantization VAE (R3-VAE). This framework incorporates three key innovations: (i) a reference vector that functions as a semantic anchor for the initial features, thereby mitigating sensitivity to initialization; (ii) a dot product-based rating mechanism designed to stabilize the training process and prevent codebook collapse; and (iii) two SID evaluation metrics, Semantic Cohesion and Preference Discrimination, serving as regularization terms during training. Empirical results on six benchmarks demonstrate that R3-VAE outperforms state-of-the-art methods, achieving an average improvement of 14.2% in Recall@10 and 15.5% in NDCG@10 across three Amazon datasets. Furthermore, we perform GR training and online A/B tests on a prominent news recommendation platform. Our method achieves a 1.62% improvement in MRR and a 0.83% gain in StayTime/U versus baselines. Additionally, we employ R3-VAE to replace the item ID of CTR model, resulting in significant improvements in content cold start by 15.36%, corroborating the strong applicability and business value in industry-scale recommendation scenarios.
翻译:生成式推荐(Generative Recommendation,GR)因其优越的性能和冷启动能力而日益受到关注。作为GR的核心组件,语义标识符(Semantic Identifiers,SIDs)通过离散令牌表征物品语义。然而,当前基于向量量化的SID生成技术面临两大挑战:(i)训练稳定性不足,源于直通估计器的梯度传播不充分及对初始化的敏感性;(ii)SID质量评估效率低下,工业实践中仍需依赖成本高昂的GR训练与A/B测试。针对上述问题,本文提出参考向量引导评分残差量化变分自编码器(R3-VAE)。该框架包含三项关键创新:(i)参考向量,作为初始特征的语义锚点以缓解初始化敏感性;(ii)基于点积的评分机制,旨在稳定训练过程并防止码本坍缩;(iii)两项SID评估指标——语义凝聚性与偏好区分度,作为训练过程中的正则化项。在六个基准数据集上的实验结果表明,R3-VAE在三个亚马逊数据集上的平均Recall@10与NDCG@10分别提升14.2%和15.5%,优于现有最优方法。此外,我们在某知名新闻推荐平台开展GR训练与在线A/B测试,所提方法在MRR上提升1.62%,在逗留时长/独立访客数(StayTime/U)上提升0.83%。进一步,我们采用R3-VAE替代CTR模型的物品ID,内容冷启动性能显著提升15.36%,验证了其在工业级推荐场景中的强大适用性与商业价值。