Generative recommendation (GenRec) models typically model user behavior via full attention, but scaling to lifelong sequences is hindered by prohibitive computational costs and noise accumulation from stochastic interactions. To address these challenges, we introduce Rec2PM, a framework that compresses long user interaction histories into compact Preference Memory tokens. Unlike traditional recurrent methods that suffer from serial training, Rec2PM employs a novel self-referential teacher-forcing strategy: it leverages a global view of the history to generate reference memories, which serve as supervision targets for parallelized recurrent updates. This allows for fully parallel training while maintaining the capability for iterative updates during inference. Additionally, by representing memory as token embeddings rather than extensive KV caches, Rec2PM achieves extreme storage efficiency. Experiments on large-scale benchmarks show that Rec2PM significantly reduces inference latency and memory footprint while achieving superior accuracy compared to full-sequence models. Analysis reveals that the Preference Memory functions as a denoising Information Bottleneck, effectively filtering interaction noise to capture robust long-term interests.
翻译:生成式推荐模型通常通过全注意力机制建模用户行为,但扩展到终身序列时面临两大挑战:高昂的计算成本与随机交互产生的噪声累积。为应对这些问题,我们提出Rec2PM框架,该框架将长用户交互历史压缩为紧凑的偏好记忆令牌。与传统循环方法受限于串行训练不同,Rec2PM采用创新的自参照教师强制策略:利用历史全局视角生成参考记忆,作为并行化循环更新的监督目标。这种设计实现了完全并行训练,同时保持推理过程中迭代更新的能力。此外,通过将记忆表示为令牌嵌入而非庞大的键值缓存,Rec2PM实现了极致的存储效率。在大规模基准测试上的实验表明,与全序列模型相比,Rec2PM在显著降低推理延迟和内存占用的同时,获得了更优的推荐精度。分析表明,偏好记忆发挥着去噪信息瓶颈的作用,能有效过滤交互噪声以捕捉稳健的长期兴趣。