Modeling long sequences of user behaviors has emerged as a critical frontier in generative recommendation. However, existing solutions face a dilemma: linear attention mechanisms achieve efficiency at the cost of retrieval precision due to limited state capacity, while softmax attention suffers from prohibitive computational overhead. To address this challenge, we propose HyTRec, a model featuring a Hybrid Attention architecture that explicitly decouples long-term stable preferences from short-term intent spikes. By assigning massive historical sequences to a linear attention branch and reserving a specialized softmax attention branch for recent interactions, our approach restores precise retrieval capabilities within industrial-scale contexts involving ten thousand interactions. To mitigate the lag in capturing rapid interest drifts within the linear layers, we furthermore design Temporal-Aware Delta Network (TADN) to dynamically upweight fresh behavioral signals while effectively suppressing historical noise. Empirical results on industrial-scale datasets confirm the superiority that our model maintains linear inference speed and outperforms strong baselines, notably delivering over 8% improvement in Hit Rate for users with ultra-long sequences with great efficiency.
翻译:对用户长行为序列进行建模已成为生成式推荐领域的关键前沿。然而,现有解决方案面临一个困境:线性注意力机制因状态容量有限,虽实现了效率却牺牲了检索精度;而Softmax注意力则因计算开销过高而难以应用。为应对这一挑战,我们提出了HyTRec模型,其采用混合注意力架构,明确地将长期稳定偏好与短期意图波动解耦。通过将海量历史序列分配给线性注意力分支,并为近期交互保留专门的Softmax注意力分支,我们的方法在涉及上万次交互的工业级场景中恢复了精确检索能力。为缓解线性层在捕捉快速兴趣漂移时的滞后性,我们进一步设计了时序感知增量网络,以动态增强新近行为信号的权重,同时有效抑制历史噪声。在工业级数据集上的实证结果证实了本模型的优越性:它在保持线性推理速度的同时,显著超越了现有基线模型,尤其在对超长序列用户进行高效推荐时,命中率提升超过8%。