SSDRec: Self-Augmented Sequence Denoising for Sequential Recommendation

Traditional sequential recommendation methods assume that users' sequence data is clean enough to learn accurate sequence representations to reflect user preferences. In practice, users' sequences inevitably contain noise (e.g., accidental interactions), leading to incorrect reflections of user preferences. Consequently, some pioneer studies have explored modeling sequentiality and correlations in sequences to implicitly or explicitly reduce noise's influence. However, relying on only available intra-sequence information (i.e., sequentiality and correlations in a sequence) is insufficient and may result in over-denoising and under-denoising problems (OUPs), especially for short sequences. To improve reliability, we propose to augment sequences by inserting items before denoising. However, due to the data sparsity issue and computational costs, it is challenging to select proper items from the entire item universe to insert into proper positions in a target sequence. Motivated by the above observation, we propose a novel framework--Self-augmented Sequence Denoising for sequential Recommendation (SSDRec) with a three-stage learning paradigm to solve the above challenges. In the first stage, we empower SSDRec by a global relation encoder to learn multi-faceted inter-sequence relations in a data-driven manner. These relations serve as prior knowledge to guide subsequent stages. In the second stage, we devise a self-augmentation module to augment sequences to alleviate OUPs. Finally, we employ a hierarchical denoising module in the third stage to reduce the risk of false augmentations and pinpoint all noise in raw sequences. Extensive experiments on five real-world datasets demonstrate the superiority of \model over state-of-the-art denoising methods and its flexible applications to mainstream sequential recommendation models. The source code is available at https://github.com/zc-97/SSDRec.

翻译：传统序列推荐方法假设用户的序列数据足够纯净，从而能够学习准确的序列表示以反映用户偏好。然而实际场景中，用户序列不可避免地包含噪声（如误点击等偶然交互行为），导致对用户偏好的错误表征。为此，部分前瞻性研究通过建模序列中的时序关联性与相关性，以隐式或显式地降低噪声影响。但仅依赖序列内部信息（即单序列的时序特征与相关性表征）仍存在局限性，尤其对短序列而言，容易引发过度去噪与欠去噪问题。为提升可靠性，我们提出在去噪前通过插入物品来增强序列。然而，受数据稀疏性与计算成本制约，如何从全局商品空间中选取合适的物品并定位到目标序列的恰当位置仍具挑战性。基于上述观察，我们提出新型框架——面向序列推荐的自增强序列去噪方法，该框架采用三阶段学习范式解决上述难题。第一阶段通过全局关系编码器以数据驱动方式学习多维度跨序列关联，这些关联作为先验知识指导后续阶段；第二阶段设计自增强模块对序列进行增强以缓解过度/欠去噪问题；第三阶段采用层级去噪模块降低虚假增强风险并精准定位原始序列中的全部噪声。在五个真实数据集上的大量实验表明，本模型相较当前最先进的去噪方法具有显著优势，且能灵活适配主流序列推荐模型。源代码已开源至 https://github.com/zc-97/SSDRec。