By generating new yet effective data, data augmentation has become a promising method to mitigate the data sparsity problem in sequential recommendation. Existing works focus on augmenting the original data but rarely explore the issue of imbalanced relevance and diversity for augmented data, leading to semantic drift problems or limited performance improvements. In this paper, we propose a novel Balanced data Augmentation Plugin for Sequential Recommendation (BASRec) to generate data that balance relevance and diversity. BASRec consists of two modules: Single-sequence Augmentation and Cross-sequence Augmentation. The former leverages the randomness of the heuristic operators to generate diverse sequences for a single user, after which the diverse and the original sequences are fused at the representation level to obtain relevance. Further, we devise a reweighting strategy to enable the model to learn the preferences based on the two properties adaptively. The Cross-sequence Augmentation performs nonlinear mixing between different sequence representations from two directions. It produces virtual sequence representations that are diverse enough but retain the vital semantics of the original sequences. These two modules enhance the model to discover fine-grained preferences knowledge from single-user and cross-user perspectives. Extensive experiments verify the effectiveness of BASRec. The average improvement is up to 72.0% on GRU4Rec, 33.8% on SASRec, and 68.5% on FMLP-Rec. We demonstrate that BASRec generates data with a better balance between relevance and diversity than existing methods. The source code is available at https://github.com/KingGugu/BASRec.
翻译:通过生成新颖且有效的数据,数据增强已成为缓解序列推荐中数据稀疏性问题的一种有前景的方法。现有研究主要集中于对原始数据进行增强,但鲜少探讨增强数据中相关性与多样性的不平衡问题,这导致语义漂移问题或性能提升有限。本文提出一种新颖的序列推荐平衡数据增强插件(BASRec),用于生成兼顾相关性与多样性的数据。BASRec包含两个模块:单序列增强与跨序列增强。前者利用启发式算子的随机性为单个用户生成多样化序列,随后将多样化序列与原始序列在表示层面进行融合以获得相关性。进一步,我们设计了一种重加权策略,使模型能够基于这两种特性自适应地学习用户偏好。跨序列增强模块则从两个方向对不同序列表示进行非线性混合,生成具有足够多样性同时保留原始序列关键语义的虚拟序列表示。这两个模块共同增强了模型从单用户与跨用户视角挖掘细粒度偏好知识的能力。大量实验验证了BASRec的有效性:在GRU4Rec上平均提升达72.0%,在SASRec上达33.8%,在FMLP-Rec上达68.5%。我们证明BASRec生成的数据在相关性与多样性的平衡方面优于现有方法。源代码发布于https://github.com/KingGugu/BASRec。