By generating new yet effective data, data augmentation has become a promising method to mitigate the data sparsity problem in sequential recommendation. Existing works focus on augmenting the original data but rarely explore the issue of imbalanced relevance and diversity for augmented data, leading to semantic drift problems or limited performance improvements. In this paper, we propose a novel Balanced data Augmentation Plugin for Sequential Recommendation (BASRec) to generate data that balance relevance and diversity. BASRec consists of two modules: Single-sequence Augmentation and Cross-sequence Augmentation. The former leverages the randomness of the heuristic operators to generate diverse sequences for a single user, after which the diverse and the original sequences are fused at the representation level to obtain relevance. Further, we devise a reweighting strategy to enable the model to learn the preferences based on the two properties adaptively. The Cross-sequence Augmentation performs nonlinear mixing between different sequence representations from two directions. It produces virtual sequence representations that are diverse enough but retain the vital semantics of the original sequences. These two modules enhance the model to discover fine-grained preferences knowledge from single-user and cross-user perspectives. Extensive experiments verify the effectiveness of BASRec. The average improvement is up to 72.0% on GRU4Rec, 33.8% on SASRec, and 68.5% on FMLP-Rec. We demonstrate that BASRec generates data with a better balance between relevance and diversity than existing methods. The source code is available at https://github.com/KingGugu/BASRec.
翻译:通过生成新颖且有效的数据,数据增强已成为缓解序列推荐中数据稀疏性问题的有效方法。现有研究主要关注对原始数据的增强,但很少探讨增强数据中相关性与多样性的不平衡问题,这导致了语义漂移问题或性能提升有限。本文提出了一种新颖的平衡数据增强插件用于序列推荐(BASRec),以生成平衡相关性与多样性的数据。BASRec包含两个模块:单序列增强和跨序列增强。前者利用启发式算子的随机性为单个用户生成多样化的序列,随后在表示层将多样化序列与原始序列融合以获得相关性。此外,我们设计了一种重加权策略,使模型能够基于这两个属性自适应地学习用户偏好。跨序列增强模块从两个方向对不同序列表示进行非线性混合,生成既足够多样化又保留原始序列关键语义的虚拟序列表示。这两个模块从单用户和跨用户视角增强了模型对细粒度偏好知识的发现能力。大量实验验证了BASRec的有效性:在GRU4Rec上平均提升达72.0%,在SASRec上达33.8%,在FMLP-Rec上达68.5%。我们证明BASRec生成的数据在相关性与多样性的平衡方面优于现有方法。源代码发布于https://github.com/KingGugu/BASRec。