Modern recommender systems leverage ultra-long user behavior sequences to capture dynamic preferences, but end-to-end modeling is infeasible in production due to latency and memory constraints. While summarizing history via interest centers offers a practical alternative, existing methods struggle to (1) identify user-specific centers at appropriate granularity and (2) accurately assign behaviors, leading to quantization errors and loss of long-tail preferences. To alleviate these issues, we propose Hierarchical Sparse Activation Compression (HiSAC), an efficient framework for personalized sequence modeling. HiSAC encodes interactions into multi-level semantic IDs and constructs a global hierarchical codebook. A hierarchical voting mechanism sparsely activates personalized interest-agents as fine-grained preference centers. Guided by these agents, Soft-Routing Attention aggregates historical signals in semantic space, weighting by similarity to minimize quantization error and retain long-tail behaviors. Deployed on Taobao's "Guess What You Like" homepage, HiSAC achieves significant compression and cost reduction, with online A/B tests showing a consistent 1.65% CTR uplift -- demonstrating its scalability and real-world effectiveness.
翻译:现代推荐系统利用超长用户行为序列来捕捉动态偏好,但由于延迟和内存限制,端到端建模在生产环境中并不可行。虽然通过兴趣中心总结历史提供了一种实用的替代方案,但现有方法难以(1)在合适的粒度上识别用户特定的中心,以及(2)准确分配行为,从而导致量化误差和长尾偏好的丢失。为缓解这些问题,我们提出了层次化稀疏激活压缩(HiSAC),一种用于个性化序列建模的高效框架。HiSAC将交互编码为多级语义ID,并构建一个全局层次化码本。一个层次化投票机制稀疏地激活个性化的兴趣代理,作为细粒度的偏好中心。在这些代理的引导下,软路由注意力在语义空间中聚合历史信号,并通过相似度加权以最小化量化误差并保留长尾行为。在淘宝“猜你喜欢”首页部署后,HiSAC实现了显著的压缩和成本降低,在线A/B测试显示点击率持续提升1.65%——证明了其可扩展性和实际有效性。