Attention mechanism remains the defining operator in Transformers since it provides expressive global credit assignment, yet its $O(N^2 d)$ time and memory cost in sequence length $N$ makes long-context modeling expensive and often forces truncation or other heuristics. Linear attention reduces complexity to $O(N d^2)$ by reordering computation through kernel feature maps, but this reformulation drops the softmax mechanism and shifts the attention score distribution. In recommender systems, low-rank structure in matrices is not a rare case, but rather the default inductive bias in its representation learning, particularly explicit in the user behavior sequence modeling. Leveraging this structure, we introduce SVD-Attention, which is theoretically lossless on low-rank matrices and preserves softmax while reducing attention complexity from $O(N^2 d)$ to $O(Ndr)$. With SVD-Attention, we propose SOLAR, SVD-Optimized Lifelong Attention for Recommendation, a sequence modeling framework that supports behavior sequences of ten-thousand scale and candidate sets of several thousand items in cascading process without any filtering. In Kuaishou's online recommendation scenario, SOLAR delivers a 0.68\% Video Views gain together with additional business metrics improvements.
翻译:注意力机制因其提供表达力强的全局信用分配而始终是Transformer中的核心算子,但其在序列长度N上O(N²d)的时间与内存开销使得长上下文建模代价高昂,并常常迫使采用截断或其他启发式方法。线性注意力通过核特征映射重排计算将复杂度降低至O(Nd²),但该重构舍弃了softmax机制并改变了注意力分数分布。在推荐系统中,矩阵的低秩结构并非罕见情况,而是其表示学习中默认的归纳偏置,在用户行为序列建模中尤为明显。利用此结构,我们提出SVD-Attention,该方法在理论上对低秩矩阵无损,在保持softmax的同时将注意力复杂度从O(N²d)降至O(Ndr)。基于SVD-Attention,我们提出SOLAR(面向推荐系统的SVD优化终身注意力机制),这是一个支持万级行为序列与级联过程中数千候选物品集且无需任何过滤的序列建模框架。在快手在线推荐场景中,SOLAR实现了0.68%的视频播放量提升及其他业务指标的同步改善。