Investigating the Robustness of Sequential Recommender Systems Against Training Data Perturbations

Sequential Recommender Systems (SRSs) are widely employed to model user behavior over time. However, their robustness in the face of perturbations in training data remains a largely understudied yet critical issue. A fundamental challenge emerges in previous studies aimed at assessing the robustness of SRSs: the Rank-Biased Overlap (RBO) similarity is not particularly suited for this task as it is designed for infinite rankings of items and thus shows limitations in real-world scenarios. For instance, it fails to achieve a perfect score of 1 for two identical finite-length rankings. To address this challenge, we introduce a novel contribution: Finite Rank-Biased Overlap (FRBO), an enhanced similarity tailored explicitly for finite rankings. This innovation facilitates a more intuitive evaluation in practical settings. In pursuit of our goal, we empirically investigate the impact of removing items at different positions within a temporally ordered sequence. We evaluate two distinct SRS models across multiple datasets, measuring their performance using metrics such as Normalized Discounted Cumulative Gain (NDCG) and Rank List Sensitivity. Our results demonstrate that removing items at the end of the sequence has a statistically significant impact on performance, with NDCG decreasing up to 60%. Conversely, removing items from the beginning or middle has no significant effect. These findings underscore the criticality of the position of perturbed items in the training data. As we spotlight the vulnerabilities inherent in current SRSs, we fervently advocate for intensified research efforts to fortify their robustness against adversarial perturbations.

翻译：序列推荐系统（SRSs）被广泛用于随时间建模用户行为，然而，其在面对训练数据扰动时的鲁棒性仍是一个亟待研究的重大问题。过去评估SRSs鲁棒性的研究中出现了一个根本性挑战：秩偏重叠（RBO）相似度并不特别适用于此任务，因为它专为无限项目序列设计，在现实场景中具有局限性。例如，对于两个相同的有限序列，它无法获得完美的1分。为解决此问题，我们提出了一项创新贡献：有限秩偏重叠（FRBO），一种专门针对有限序列优化的改进相似度指标。这一创新有助于在实际应用中进行更直观的评估。为实现研究目标，我们通过实证探索了在按时间排序的序列中删除不同位置项目的影响。我们在多个数据集上评估了两种不同的SRS模型，并使用归一化折损累计增益（NDCG）和序列列表敏感度等指标测量其性能。结果表明，删除序列末尾的项目对性能有统计上显著的影响，NDCG最多下降60%；而删除开头或中间的项目则无显著影响。这些发现强调了训练数据中扰动项目位置的关键性。通过揭示当前SRSs的固有脆弱性，我们强烈呼吁加强研究力度，以增强其对抗对抗性扰动的鲁棒性。