Machine Unlearning has recently been emerging as a paradigm for selectively removing the impact of training datapoints from a network. While existing approaches have focused on unlearning either a small subset of the training data or a single class, in this paper we take a different path and devise a framework that can unlearn all classes of an image classification network in a single untraining round. Our proposed technique learns to modulate the inner components of an image classification network through memory matrices so that, after training, the same network can selectively exhibit an unlearning behavior over any of the classes. By discovering weights which are specific to each of the classes, our approach also recovers a representation of the classes which is explainable by-design. We test the proposed framework, which we name Weight Filtering network (WF-Net), on small-scale and medium-scale image classification datasets, with both CNN and Transformer-based backbones. Our work provides interesting insights in the development of explainable solutions for unlearning and could be easily extended to other vision tasks.
翻译:机器遗忘最近作为一种从网络中选择性地移除训练数据点影响的新范式而涌现。现有方法主要聚焦于遗忘训练数据中的小子集或单个类别,而本文另辟蹊径,设计了一种能在单轮去训练过程中遗忘图像分类网络所有类别的框架。我们提出的技术通过记忆矩阵学习调节图像分类网络的内部组件,使得训练后的同一网络能够针对任意类别选择性地表现出遗忘行为。通过发现每个类别特有的权重,我们的方法还恢复了具有可解释性的类别表征。我们在基于CNN和Transformer骨干网络的中小规模图像分类数据集上测试了所提出的框架(命名为权重滤波网络WF-Net)。本文工作为开发可解释的遗忘解决方案提供了有趣的见解,并可轻松扩展到其他视觉任务。