Semantic Trimming and Auxiliary Multi-step Prediction for Generative Recommendation

Generative Recommendation (GR) has recently transitioned from atomic item-indexing to Semantic ID (SID)-based frameworks to capture intrinsic item relationships and enhance generalization. However, the adoption of high-granularity SIDs leads to two critical challenges: prohibitive training overhead due to sequence expansion and unstable performance reliability characterized by non-monotonic accuracy fluctuations. We identify that these disparate issues are fundamentally rooted in the Semantic Dilution Effect, where redundant tokens waste massive computation and dilute the already sparse learning signals in recommendation. To counteract this, we propose STAMP (Semantic Trimming and Auxiliary Multi-step Prediction), a framework utilizing a dual-end optimization strategy. We argue that effective SID learning requires simultaneously addressing low input information density and sparse output supervision. On the input side, Semantic Adaptive Pruning (SAP) dynamically filters redundancy during the forward pass, converting noise-laden sequences into compact, information-rich representations. On the output side, Multi-step Auxiliary Prediction (MAP) employs a multi-token objective to densify feedback, strengthening long-range dependency capture and ensuring robust learning signals despite compressed inputs. Unifying input purification and signal amplification, STAMP enhances both training efficiency and representation capability. Experiments on public Amazon and large-scale industrial datasets show STAMP achieves 1.23--1.38$\times$ speedup and 17.2\%--54.7\% VRAM reduction while maintaining or improving performance across multiple architectures.

翻译：生成式推荐（GR）近期从原子化物品索引转向基于语义ID（SID）的框架，以捕获物品内在关联并增强泛化能力。然而，高粒度语义ID的采用导致两大关键挑战：序列扩展带来的训练开销激增，以及以非单调精度波动为特征的不稳定性能可靠性。我们确定这些不同问题本质上源于语义稀释效应——冗余词元浪费大量计算，并稀释推荐本已稀疏的学习信号。为应对此问题，我们提出STAMP（语义裁剪与辅助多步预测）框架，采用双端优化策略。我们论证，有效的语义ID学习需要同时解决输入信息密度低与输出监督稀疏的问题。在输入端，语义自适应剪枝（SAP）在前向传播过程中动态过滤冗余，将含噪声序列转化为紧凑、信息丰富的表征。在输出端，多步辅助预测（MAP）采用多词元目标稠密化反馈，强化长距离依赖捕获，确保即使输入被压缩也能获得鲁棒的学习信号。通过统一输入净化与信号放大，STAMP提升了训练效率与表征能力。在公开Amazon数据集和大规模工业数据集上的实验表明，STAMP在保持或提升多架构性能的同时，实现了1.23--1.38倍加速与17.2%--54.7%的显存占用降低。