R3-VAE: Reference Vector-Guided Rating Residual Quantization VAE for Generative Recommendation

Generative Recommendation (GR) has gained traction for its merits of superior performance and cold-start capability. As the vital role in GR, Semantic Identifiers (SIDs) represent item semantics through discrete tokens. However, current techniques for SID generation based on vector quantization face two main challenges: (i) training instability, stemming from insufficient gradient propagation through the straight-through estimator and sensitivity to initialization; and (ii) inefficient SID quality assessment, where industrial practice still depends on costly GR training and A/B testing. To address these challenges, we propose Reference Vector-Guided Rating Residual Quantization VAE (R3-VAE). This framework incorporates three key innovations: (i) a reference vector that functions as a semantic anchor for the initial features, thereby mitigating sensitivity to initialization; (ii) a dot product-based rating mechanism designed to stabilize the training process and prevent codebook collapse; and (iii) two SID evaluation metrics, Semantic Cohesion and Preference Discrimination, serving as regularization terms during training. Empirical results on six benchmarks demonstrate that R3-VAE outperforms state-of-the-art methods, achieving an average improvement of 14.2% in Recall@10 and 15.5% in NDCG@10 across three Amazon datasets. Furthermore, we perform GR training and online A/B tests on Toutiao. Our method achieves a 1.62% improvement in MRR and a 0.83% gain in StayTime/U versus baselines. Additionally, we employ R3-VAE to replace the item ID of CTR model, resulting in significant improvements in content cold start by 15.36%, corroborating the strong applicability and business value in industry-scale recommendation scenarios.

翻译：生成式推荐因其优越的性能和冷启动能力而受到广泛关注。作为生成式推荐的核心，语义标识符通过离散令牌表示物品语义。然而，当前基于向量量化的语义标识符生成技术面临两大挑战：(i) 训练不稳定，源于通过直通估计器的梯度传播不足以及对初始化的敏感性；(ii) 语义标识符质量评估效率低下，工业实践中仍依赖于高成本的生成式推荐训练和A/B测试。为解决这些问题，我们提出参考向量引导评分残差量化变分自编码器。该框架包含三项关键创新：(i) 参考向量作为初始特征的语义锚点，从而降低对初始化的敏感性；(ii) 基于点积的评分机制，旨在稳定训练过程并防止码本坍塌；(iii) 两个语义标识符评估指标——语义凝聚性和偏好区分性，作为训练过程中的正则化项。在六个基准数据集上的实验结果表明，R3-VAE优于现有最先进方法，在三个亚马逊数据集上Recall@10平均提升14.2%，NDCG@10平均提升15.5%。此外，我们在今日头条平台上进行生成式推荐训练和在线A/B测试。相较于基线方法，我们的方法在MRR上提升1.62%，在单用户停留时长上提升0.83%。同时，我们采用R3-VAE替代点击率模型中的物品ID，使内容冷启动性能显著提升15.36%，验证了其在工业级推荐场景中的强大适用性和商业价值。