FuXi-Linear: Unleashing the Power of Linear Attention in Long-term Time-aware Sequential Recommendation

Modern recommendation systems primarily rely on attention mechanisms with quadratic complexity, which limits their ability to handle long user sequences and slows down inference. While linear attention is a promising alternative, existing research faces three critical challenges: (1) temporal signals are often overlooked or integrated via naive coupling that causes mutual interference between temporal and semantic signals while neglecting behavioral periodicity; (2) insufficient positional information provided by existing linear frameworks; and (3) a primary focus on short sequences and shallow architectures. To address these issues, we propose FuXi-Linear, a linear-complexity model designed for efficient long-sequence recommendation. Our approach introduces two key components: (1) a Temporal Retention Channel that independently computes periodic attention weights using temporal data, preventing crosstalk between temporal and semantic signals; (2) a Linear Positional Channel that integrates positional information through learnable kernels within linear complexity. Moreover, we demonstrate that FuXi-Linear exhibits a robust power-law scaling property at a thousand-length scale, a characteristic largely unexplored in prior linear recommendation studies. Extensive experiments on sequences of several thousand tokens demonstrate that FuXi-Linear outperforms state-of-the-art models in recommendation quality, while achieving up to 10$\times$ speedup in the prefill stage and up to 21$\times$ speedup in the decode stage compared to competitive baselines. Our code has been released in a public repository https://github.com/USTC-StarTeam/fuxi-linear.

翻译：现代推荐系统主要依赖具有二次复杂度的注意力机制，这限制了其处理长用户序列的能力并减慢了推理速度。虽然线性注意力是一种有前景的替代方案，但现有研究面临三个关键挑战：(1) 时序信号常被忽视或通过简单耦合集成，导致时序与语义信号相互干扰，同时忽略了行为周期性；(2) 现有线性框架提供的的位置信息不足；(3) 主要关注短序列和浅层架构。为解决这些问题，我们提出了FuXi-Linear，一种专为高效长序列推荐设计的线性复杂度模型。我们的方法引入了两个关键组件：(1) 一个时序保持通道，它使用时序数据独立计算周期性注意力权重，防止时序与语义信号之间的串扰；(2) 一个线性位置通道，它通过可学习核在线性复杂度内集成位置信息。此外，我们证明了FuXi-Linear在千级长度尺度上展现出稳健的幂律缩放特性，这一特性在以往的线性推荐研究中很大程度上未被探索。在数千个令牌长度的序列上进行的大量实验表明，FuXi-Linear在推荐质量上优于最先进的模型，同时与竞争基线相比，在预填充阶段实现了高达10倍的加速，在解码阶段实现了高达21倍的加速。我们的代码已在公共仓库 https://github.com/USTC-StarTeam/fuxi-linear 中发布。