Generative recommendation systems typically leverage Semantic Identifiers (SIDs), which represent each item as a sequence of tokens that encode semantic information. However, representing item ID with multiple SIDs significantly increases input sequence length, which is a major determinant of computational complexity and memory consumption. While existing efforts primarily focus on optimizing attention computation and KV cache, we propose RASTP (Representation-Aware Semantic Token Pruning), which directly prunes less informative tokens in the input sequence. Specifically, RASTP evaluates token importance by combining semantic saliency, measured via representation magnitude, and attention centrality, derived from cumulative attention weights. Since RASTP dynamically prunes low-information or irrelevant semantic tokens, experiments on three real-world Amazon datasets show that RASTP reduces training time by 26.7\%, while maintaining or slightly improving recommendation performance. The code has been open-sourced at https://github.com/Yuzt-zju/RASTP.
翻译:生成式推荐系统通常利用语义标识符(SIDs),将每个物品表示为一串编码语义信息的令牌序列。然而,使用多个SIDs表示物品ID会显著增加输入序列的长度,这是计算复杂度和内存消耗的主要决定因素。现有工作主要集中于优化注意力计算和KV缓存,而本文提出了RASTP(表征感知语义令牌剪枝),该方法直接对输入序列中信息量较低的令牌进行剪枝。具体而言,RASTP通过结合语义显著性(通过表征幅度衡量)和注意力中心性(源自累积注意力权重)来评估令牌的重要性。由于RASTP能够动态剪枝低信息量或不相关的语义令牌,在三个真实世界的亚马逊数据集上的实验表明,RASTP在保持或略微提升推荐性能的同时,将训练时间减少了26.7%。代码已在 https://github.com/Yuzt-zju/RASTP 开源。