Farewell to Item IDs: Unlocking the Scaling Potential of Large Ranking Models via Semantic Tokens

Recent studies on scaling up ranking models have achieved substantial improvement for recommendation systems and search engines. However, most large-scale ranking systems rely on item IDs, where each item is treated as an independent categorical symbol and mapped to a learned embedding. As items rapidly appear and disappear, these embeddings become difficult to train and maintain. This instability impedes effective learning of neural network parameters and limits the scalability of ranking models. In this paper, we show that semantic tokens possess greater scaling potential compared to item IDs. Our proposed framework TRM improves the token generation and application pipeline, leading to 33% reduction in sparse storage while achieving 0.85% AUC increase. Extensive experiments further show that TRM could consistently outperform state-of-the-art models when model capacity scales. Finally, TRM has been successfully deployed on large-scale personalized search engines, yielding 0.26% and 0.75% improvement on user active days and change query ratio respectively through A/B test.

翻译：近期关于扩展排序模型的研究在推荐系统和搜索引擎领域取得了显著进步。然而，大多数大规模排序系统仍依赖于物品ID，即将每个物品视为独立的分类符号并映射到学习得到的嵌入向量。随着物品的快速出现和消失，这些嵌入向量变得难以训练和维护。这种不稳定性阻碍了神经网络参数的有效学习，并限制了排序模型的可扩展性。本文证明，与物品ID相比，语义标记具有更大的扩展潜力。我们提出的TRM框架改进了标记生成与应用流程，在实现AUC提升0.85%的同时，稀疏存储减少了33%。大量实验进一步表明，当模型容量扩展时，TRM能够持续超越最先进的模型。最终，TRM已成功部署于大规模个性化搜索引擎，通过A/B测试在用户活跃天数和查询变更率上分别实现了0.26%和0.75%的提升。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

【博士论文】电商搜索中的排序学习

专知会员服务

13+阅读 · 2025年11月15日

《序列推荐》最新综述

专知会员服务

22+阅读 · 2024年12月27日

大语言模型在序列推荐中的应用

专知会员服务

19+阅读 · 2024年11月12日

【大模型+搜索】AI搜索行业深度：大模型催生搜索行业变革机遇，产品百花齐放效果几何

专知会员服务

37+阅读 · 2024年4月17日