ERASE -- A Real-World Aligned Benchmark for Unlearning in Recommender Systems

Machine unlearning (MU) enables the removal of selected training data from trained models, to address privacy compliance, security, and liability issues in recommender systems. Existing MU benchmarks poorly reflect real-world recommender settings: they focus primarily on collaborative filtering, assume unrealistically large deletion requests, and overlook practical constraints such as sequential unlearning and efficiency. We present ERASE, a large-scale benchmark for MU in recommender systems designed to align with real-world usage. ERASE spans three core tasks -- collaborative filtering, session-based recommendation, and next-basket recommendation -- and includes unlearning scenarios inspired by real-world applications, such as sequentially removing sensitive interactions or spam. The benchmark covers seven unlearning algorithms, including general-purpose and recommender-specific methods, across nine public datasets and nine state-of-the-art models. We execute ERASE to produce more than 600 GB of reusable artifacts, such as extensive experimental logs and more than a thousand model checkpoints. Crucially, the artifacts that we release enable systematic analysis of where current unlearning methods succeed and where they fall short. ERASE showcases that approximate unlearning can match retraining in some settings, but robustness varies widely across datasets and architectures. Repeated unlearning exposes weaknesses in general-purpose methods, especially for attention-based and recurrent models, while recommender-specific approaches behave more reliably. ERASE provides the empirical foundation to help the community assess, drive, and track progress toward practical MU in recommender systems.

翻译：机器遗忘学习（MU）能够从已训练模型中移除指定的训练数据，以应对推荐系统中的隐私合规、安全及责任问题。现有MU基准难以反映真实推荐场景：主要集中于协同过滤，假设不切实际的大规模删除请求，且忽略了顺序遗忘与效率等实际约束。本文提出ERASE，一个面向推荐系统的大规模机器遗忘学习基准，旨在与真实应用场景对齐。ERASE涵盖三大核心任务——协同过滤、会话推荐及下一篮推荐，并包含受现实应用启发的遗忘场景，例如顺序移除敏感交互或垃圾信息。该基准覆盖七种遗忘算法（包括通用方法与推荐系统专用方法），涉及九个公共数据集和九个前沿模型。我们通过ERASE生成超过600 GB可复用实验成果，包括大量实验日志与上千个模型检查点。关键的是，我们发布的实验成果支持系统分析当前遗忘方法的优势与不足。ERASE表明近似遗忘在某些设定下可达到重训练效果，但其鲁棒性在不同数据集与架构间差异显著。重复遗忘暴露出通用方法（特别是基于注意力与循环的模型）的缺陷，而推荐系统专用方法表现更为稳定。ERASE为学界评估、推动和追踪推荐系统实用化机器遗忘学习的进展提供了实证基础。