Beyond Static Collision Handling: Adaptive Semantic ID Learning for Multimodal Recommendation at Industrial Scale

Modern recommendation systems involve massive catalogs of multimodal items, where scalable item identification must balance compactness, semantic fidelity, and downstream effectiveness. Semantic IDs (SIDs) address this need by representing items as short discrete token sequences derived from multimodal signals, providing a compact interface for retrieval, ranking, and generative recommendation. However, effective SID learning is hindered by collisions, where different items are assigned identical or highly confusable codes. Existing methods mainly rely on improved quantization or fixed overlap regularization, but they do not adaptively distinguish whether an overlap should be suppressed or preserved. We propose AdaSID, an adaptive semantic ID learning framework for recommendation. AdaSID regulates SID overlaps through a two-stage process. First, it relaxes repulsion for observed overlaps when the involved items are semantically compatible, preserving admissible sharing rather than uniformly separating all collisions. Second, it allocates the remaining regulation pressure according to local collision load and training progress, strengthening control in congested regions while gradually rebalancing optimization toward recommendation alignment. This design adaptively decides which overlaps to penalize, how strongly to regulate them, and when to shift the learning focus. Extensive offline and online experiments validate AdaSID. On two public benchmarks, AdaSID improves Recall and NDCG by about 4.5% on average over strong baselines, while improving codebook utilization and SID diversity. In Kuaishou e-commerce, an online A/B test on short-video retrieval covering tens of millions of users achieves statistically significant gains, including a 0.98% GMV improvement, and industrial ranking evaluation shows consistent AUC improvements.

翻译：现代推荐系统涉及海量多模态物品目录，其中可扩展的物品标识需平衡紧凑性、语义保真度与下游任务效果。语义ID（SID）通过将物品表示为从多模态信号导出的短离散符号序列来满足这一需求，从而为检索、排序和生成式推荐提供紧凑的接口。然而，有效的SID学习受到冲突的阻碍：不同物品被分配相同或高度相似的编码。现有方法主要依赖改进的量化或固定重叠正则化，但未能自适应地区分重叠应被抑制还是保留。我们提出AdaSID，一种面向推荐的自适应语义ID学习框架。AdaSID通过两阶段过程调控SID重叠：首先，当涉及物品语义兼容时，它放松对已观测重叠的排斥，保留可容许的共享而非统一分离所有冲突；其次，它根据局部冲突负载和训练进度分配剩余调控压力，在拥挤区域加强控制，同时逐步将优化重心重新平衡至推荐对齐。该设计自适应地决定哪些重叠应被惩罚、调控强度如何、以及何时转移学习焦点。大量离线和在线实验验证了AdaSID的有效性。在两个公开基准上，AdaSID在Recall和NDCG上相比强基线平均提升约4.5%，同时提高了码本利用率和SID多样性。在快手电商中，覆盖数千万用户的短视频检索在线A/B测试取得了统计显著增益，包括0.98%的GMV提升，工业级排序评估也显示了持续的AUC改进。