Tail-Aware Data Augmentation for Long-Tail Sequential Recommendation

Sequential recommendation (SR) learns user preferences based on their historical interaction sequences and provides personalized suggestions. In real-world scenarios, most users can only interact with a handful of items, while the majority of items are seldom consumed. This pervasive long-tail challenge limits the model's ability to learn user preferences. Despite previous efforts to enrich tail items/users with knowledge from head parts or improve tail learning through additional contextual information, they still face the following issues: 1) They struggle to improve the situation where interactions of tail users/items are scarce, leading to incomplete preferences learning for the tail parts. 2) Existing methods often degrade overall or head parts performance when improving accuracy for tail users/items, thereby harming the user experience. We propose Tail-Aware Data Augmentation (TADA) for long-tail sequential recommendation, which enhances the interaction frequency for tail items/users while maintaining head performance, thereby promoting the model's learning capabilities for the tail. Specifically, we first capture the co-occurrence and correlation among low-popularity items by a linear model. Building upon this, we design two tail-aware augmentation operators, T-Substitute and T-Insert. The former replaces the head item with a relevant item, while the latter utilizes co-occurrence relationships to extend the original sequence by incorporating both head and tail items. The augmented and original sequences are mixed at the representation level to preserve preference knowledge. We further extend the mix operation across different tail-user sequences and augmented sequences to generate richer augmented samples, thereby improving tail performance. Comprehensive experiments demonstrate the superiority of our method. The codes are provided at https://github.com/KingGugu/TADA.

翻译：序列推荐（SR）通过用户历史交互序列学习其偏好并提供个性化建议。在实际场景中，大多数用户仅能交互少量物品，而绝大多数物品极少被消费。这种普遍存在的长尾问题限制了模型学习用户偏好的能力。尽管已有研究尝试通过头部知识丰富尾部物品/用户，或借助额外上下文信息改进尾部学习，它们仍面临以下问题：1）难以改善尾部用户/物品交互稀缺的状况，导致尾部偏好学习不完整；2）现有方法在提升尾部用户/物品准确率时，往往损害整体或头部性能，从而影响用户体验。本文提出面向长尾序列推荐的尾部感知数据增强方法（TADA），在保持头部性能的同时提升尾部物品/用户的交互频率，从而增强模型对尾部的学习能力。具体而言，我们首先通过线性模型捕捉低流行度物品间的共现与关联关系。在此基础上，设计了两种尾部感知增强算子：T-替换与T-插入。前者将头部物品替换为相关物品，后者则利用共现关系，通过同时引入头部与尾部物品来扩展原始序列。增强序列与原始序列在表示层面进行混合以保留偏好知识。我们进一步将混合操作扩展至不同尾部用户序列与增强序列之间，以生成更丰富的增强样本，从而提升尾部性能。综合实验验证了本方法的优越性。代码发布于 https://github.com/KingGugu/TADA。