By generating new yet effective data, data augmentation has become a promising method to mitigate the data sparsity problem in sequential recommendation. Existing works focus on augmenting the original data but rarely explore the issue of imbalanced relevance and diversity for augmented data, leading to semantic drift problems or limited performance improvements. In this paper, we propose a novel Balanced data Augmentation Plugin for Sequential Recommendation (BASRec) to generate data that balance relevance and diversity. BASRec consists of two modules: Single-sequence Augmentation and Cross-sequence Augmentation. The former leverages the randomness of the heuristic operators to generate diverse sequences for a single user, after which the diverse and the original sequences are fused at the representation level to obtain relevance. Further, we devise a reweighting strategy to enable the model to learn the preferences based on the two properties adaptively. The Cross-sequence Augmentation performs nonlinear mixing between different sequence representations from two directions. It produces virtual sequence representations that are diverse enough but retain the vital semantics of the original sequences. These two modules enhance the model to discover fine-grained preferences knowledge from single-user and cross-user perspectives. Extensive experiments verify the effectiveness of BASRec. The average improvement is up to 72.0% on GRU4Rec, 33.8% on SASRec, and 68.5% on FMLP-Rec. We demonstrate that BASRec generates data with a better balance between relevance and diversity than existing methods. The source code is available at https://github.com/KingGugu/BASRec.
翻译:通过生成新颖且有效的数据,数据增强已成为缓解序列推荐中数据稀疏性问题的有效方法。现有研究主要关注对原始数据进行增强,但很少探讨增强数据中相关性与多样性的不平衡问题,这导致语义漂移问题或性能提升有限。本文提出一种新颖的平衡数据增强插件BASRec,用于生成兼顾相关性与多样性的增强数据。BASRec包含两个模块:单序列增强与跨序列增强。前者利用启发式算子的随机性为单个用户生成多样化序列,随后将多样化序列与原始序列在表示层面进行融合以保持相关性。此外,我们设计了一种重加权策略,使模型能够自适应地基于这两种特性学习用户偏好。跨序列增强模块通过双向非线性混合不同序列的表示,生成既保持原始序列核心语义又具备足够多样性的虚拟序列表示。这两个模块共同增强了模型从单用户和跨用户视角挖掘细粒度偏好知识的能力。大量实验验证了BASRec的有效性:在GRU4Rec、SASRec和FMLP-Rec模型上分别实现了最高72.0%、33.8%和68.5%的平均性能提升。实验表明,相较于现有方法,BASRec生成的增强数据在相关性与多样性之间取得了更好的平衡。源代码已公开于https://github.com/KingGugu/BASRec。