释放稀疏注意力在用户长期行为中用于CTR预测的潜力 (Unleashing the Potential of Sparse Attention on Long-term Behaviors for CTR Prediction)

In recent years, the success of large language models (LLMs) has driven the exploration of scaling laws in recommender systems. However, models that demonstrate scaling laws are actually challenging to deploy in industrial settings for modeling long sequences of user behaviors, due to the high computational complexity of the standard self-attention mechanism. Despite various sparse self-attention mechanisms proposed in other fields, they are not fully suited for recommendation scenarios. This is because user behaviors exhibit personalization and temporal characteristics: different users have distinct behavior patterns, and these patterns change over time, with data from these users differing significantly from data in other fields in terms of distribution. To address these challenges, we propose SparseCTR, an efficient and effective model specifically designed for long-term behaviors of users. To be precise, we first segment behavior sequences into chunks in a personalized manner to avoid separating continuous behaviors and enable parallel processing of sequences. Based on these chunks, we propose a three-branch sparse self-attention mechanism to jointly identify users' global interests, interest transitions, and short-term interests. Furthermore, we design a composite relative temporal encoding via learnable, head-specific bias coefficients, better capturing sequential and periodic relationships among user behaviors. Extensive experimental results show that SparseCTR not only improves efficiency but also outperforms state-of-the-art methods. More importantly, it exhibits an obvious scaling law phenomenon, maintaining performance improvements across three orders of magnitude in FLOPs. In online A/B testing, SparseCTR increased CTR by 1.72\% and CPM by 1.41\%. Our source code is available at https://github.com/laiweijiang/SparseCTR.

翻译：近年来，大型语言模型（LLMs）的成功推动了推荐系统中缩放定律的探索。然而，由于标准自注意力机制的高计算复杂度，展现出缩放定律的模型实际上难以在工业环境中部署以建模用户的长序列行为。尽管其他领域已提出多种稀疏自注意力机制，但它们并不完全适用于推荐场景。这是因为用户行为具有个性化与时间特性：不同用户具有不同的行为模式，且这些模式随时间变化，这些用户的数据在分布上也与其他领域的数据存在显著差异。为应对这些挑战，我们提出了SparseCTR，一个专门为用户长期行为设计的高效且有效的模型。具体而言，我们首先以个性化方式将行为序列分割成块，以避免分离连续行为并实现序列的并行处理。基于这些块，我们提出了一个三分支稀疏自注意力机制，以联合识别用户的全局兴趣、兴趣转移和短期兴趣。此外，我们通过可学习的、头特定的偏置系数设计了一种复合相对时间编码，以更好地捕捉用户行为间的顺序与周期关系。大量实验结果表明，SparseCTR不仅提高了效率，而且性能优于现有最先进方法。更重要的是，它展现出明显的缩放定律现象，在浮点运算次数跨越三个数量级时仍保持性能提升。在在线A/B测试中，SparseCTR使点击率提升了1.72%，千次展示收益提升了1.41%。我们的源代码可在 https://github.com/laiweijiang/SparseCTR 获取。