Modern computer systems store vast amounts of personal data, enabling advances in AI and ML but risking user privacy and trust. For privacy reasons, it is sometimes desired for an ML model to forget part of the data it was trained on. In this paper, we introduce a novel unlearning approach based on Forgetting Neural Networks (FNNs), a neuroscience-inspired architecture that explicitly encodes forgetting through multiplicative decay factors. While FNNs had previously been studied as a theoretical construct, we provide the first concrete implementation and demonstrate their effectiveness for targeted unlearning. We propose several variants with per-neuron forgetting factors, including rank-based assignments guided by activation levels, and evaluate them on MNIST and Fashion-MNIST benchmarks. Our method systematically removes information associated with forget sets while preserving performance on retained data. Membership inference attacks confirm the effectiveness of FNN-based unlearning in erasing information about the training data from the neural network. These results establish FNNs as a promising foundation for efficient and interpretable unlearning.
翻译:现代计算机系统存储海量个人数据,在推动人工智能与机器学习发展的同时,也带来了用户隐私与信任风险。出于隐私保护需求,机器学习模型有时需要遗忘部分训练数据。本文提出一种基于遗忘神经网络的新型遗忘方法,该神经科学启发的架构通过乘性衰减因子显式编码遗忘机制。尽管遗忘神经网络此前仅作为理论框架被研究,我们首次实现了其具体架构并验证了其在定向遗忘任务中的有效性。我们提出了多种具有神经元级遗忘因子的变体,包括基于激活水平指导的秩分配方案,并在MNIST和Fashion-MNIST基准数据集上进行评估。该方法能系统性地消除与遗忘集相关的信息,同时保持对保留数据的性能表现。成员推理攻击证实了基于遗忘神经网络的遗忘方法能有效擦除神经网络中关于训练数据的信息。这些结果表明遗忘神经网络为构建高效且可解释的遗忘机制奠定了重要基础。