Machine unlearning (MU) aims to remove the influence of certain data points from a trained model without costly retraining. Most practical MU algorithms are only approximate and their performance can only be assessed empirically. Care must therefore be taken to make empirical comparisons as representative as possible. A common practice is to run the MU algorithm multiple times independently starting from the same trained model. In this work, we demonstrate that this practice can give highly non-representative results because -- even for the same architecture and same dataset -- some MU methods can be highly sensitive to the choice of random number seed used for model training. We illustrate that this is particularly relevant for MU methods that are deterministic, i.e., which always produce the same result when started from the same trained model. We therefore recommend that empirical comparisons of MU algorithms should also reflect the variability across different model training seeds.
翻译:机器遗忘旨在从已训练模型中移除特定数据点的影响,而无需进行代价高昂的重新训练。大多数实用机器遗忘算法仅为近似方法,其性能只能通过实证评估。因此必须谨慎设计实证比较,以使其尽可能具有代表性。常见做法是从同一已训练模型出发,独立运行多次机器遗忘算法。本研究表明,这种做法可能产生高度非代表性的结果——因为即使对于相同架构和相同数据集,某些机器遗忘方法对模型训练所用随机数种子的选择可能极为敏感。我们特别指出,这对于确定性机器遗忘方法尤为相关,即从同一已训练模型出发时始终产生相同结果的方法。因此我们建议,机器遗忘算法的实证比较还应反映不同模型训练种子间的变异性。