Ads click-through rate (CTR) prediction is constrained by sparse user supervision: most users engage with ads infrequently while generating dense behavioral evidence in organic surfaces such as feed. Transferring these cross-domain signals into ads ranking is difficult due to domain mismatch, serving cost, and production complexity. We introduce cross-domain user Semantic IDs (SIDs) derived from organic feed activity and show that behavioral activity richness governs cross-domain transfer quality: SIDs from user profile text yield +0.036% AUC, SIDs from an activity-tuned LLaMA-based user embedding model yield +0.107%, and SIDs from direct feed activity behavioral embeddings yield +0.213%. We further propose RQ-FSQ, a residual finite scalar quantization method that discretizes pre-trained embeddings while matching dense-embedding AUC at substantially smaller storage. Across two heterogeneous sources, RQ-FSQ matches or slightly exceeds dense source embeddings, achieving +0.351% AUC for Feed Activity at about 30x smaller storage and +0.265% AUC for Activity-Tuned LLaMA at about 280x smaller storage. We also introduce a Hierarchical Discrete Embedding module that encodes multi-level SIDs through prefix n-gram sparse embedding tables trained end-to-end under the CTR objective. In a large-scale industrial ads ranking system, cold-start segment analysis shows gains up to +1.522% for users with near-zero ad interaction history, validating cross-domain behavioral transfer as an effective bridge for sparse-history ranking.
翻译:广告点击率(CTR)预测受限于稀疏的用户监督:大多数用户与广告交互频率低,但在Feed等有机界面中生成密集的行为证据。由于域不匹配、服务成本和生成复杂性,将这些跨域信号转移到广告排序中较为困难。我们引入了源自有机Feed活动的跨域用户语义ID(SID),并表明行为活动丰富度主导跨域迁移质量:来自用户文本的SID获得+0.036%AUC,基于活动微调LLaMA的用户嵌入模型SID获得+0.107%AUC,直接来自Feed活动行为嵌入的SID获得+0.213%AUC。我们进一步提出RQ-FSQ,一种残差有限标量量化方法,在显著减小存储的同时对预训练嵌入进行离散化并匹配稠密嵌入AUC。在两种异构源上,RQ-FSQ匹配或略超过稠密源嵌入,在约30倍存储缩减下为Feed活动实现+0.351%AUC提升,在约280倍存储缩减下为活动微调LLaMA实现+0.265%AUC提升。我们还引入层次离散嵌入模块,通过CTR目标下端到端训练的前缀n-gram稀疏嵌入表编码多级SID。在大规模工业广告排序系统中,冷启动分段分析显示,对近乎无广告交互历史的用户增益高达+1.522%,验证了跨域行为迁移作为稀疏历史排序有效桥梁的可行性。