Evaluating machine unlearning methods remains technically challenging, with recent benchmarks requiring complex setups and significant engineering overhead. We introduce a unified and extensible benchmarking suite that simplifies the evaluation of unlearning algorithms using the KLoM (KL divergence of Margins) metric. Our framework provides precomputed model ensembles, oracle outputs, and streamlined infrastructure for running evaluations out of the box. By standardizing setup and metrics, it enables reproducible, scalable, and fair comparison across unlearning methods. We aim for this benchmark to serve as a practical foundation for accelerating research and promoting best practices in machine unlearning. Our code and data are publicly available.
翻译:评估机器遗忘方法在技术上仍具有挑战性,现有基准测试往往需要复杂的配置和大量的工程开销。我们提出一个统一且可扩展的基准测试套件,通过KLoM(边界KL散度)指标简化遗忘算法的评估流程。该框架提供预计算模型集成、标准参考输出以及开箱即用的简化评估基础设施。通过标准化配置与评估指标,本框架支持跨遗忘方法实现可复现、可扩展且公平的性能比较。我们期望该基准测试平台能为加速机器遗忘领域的研究发展、促进最佳实践提供实用基础。相关代码与数据已公开发布。