In computer vision, machine unlearning aims to remove the influence of specific visual concepts or training images without retraining from scratch. Studies show that existing approaches often modify the classifier while leaving internal representations intact, resulting in incomplete forgetting. In this work, we extend the notion of unlearning to the representation level, deriving a three-term interplay between forgetting efficacy, retention fidelity, and class separation. Building on Neural Collapse theory, we show that the orthogonal projection of a simplex Equiangular Tight Frame (ETF) remains an ETF in a lower dimensional space, yielding a provably optimal forgetting operator. We further introduce the Representation Unlearning Score (RUS) to quantify representation-level forgetting and retention fidelity. Building on this, we introduce POUR (Provably Optimal Unlearning of Representations), a geometric projection method with closed-form (POUR-P) and a feature-level unlearning variant under a distillation scheme (POUR-D). Experiments on CIFAR-10/100 and PathMNIST demonstrate that POUR achieves effective unlearning while preserving retained knowledge, outperforming state-of-the-art unlearning methods on both classification-level and representation-level metrics.
翻译:在计算机视觉中,机器遗忘旨在不从头重新训练的情况下,移除特定视觉概念或训练图像的影响。研究表明,现有方法通常仅修改分类器而保持内部表征不变,导致遗忘不彻底。本研究将遗忘概念扩展至表征层面,推导出遗忘效能、保留保真度与类别分离三者之间的相互作用。基于神经崩溃理论,我们证明单纯形等角紧框架的正交投影在低维空间中仍保持等角紧框架结构,从而得到可证明最优的遗忘算子。我们进一步引入表征遗忘评分(RUS)来量化表征层面的遗忘效果与保留保真度。在此基础上,提出POUR(表征的可证明最优遗忘)方法,包括闭式几何投影法(POUR-P)及基于蒸馏机制的特征级遗忘变体(POUR-D)。在CIFAR-10/100和PathMNIST上的实验表明,POUR在实现有效遗忘的同时保持了已保留知识,在分类层面和表征层面的评估指标上均超越现有最优遗忘方法。