Image classifiers play a critical role in detecting diseases in medical imaging and identifying anomalies in manufacturing processes. However, their predefined behaviors after extensive training make post hoc model editing difficult, especially when it comes to forgetting specific classes or adapting to distribution shifts. Existing classifier editing methods either focus narrowly on correcting errors or incur extensive retraining costs, creating a bottleneck for flexible editing. Moreover, such editing has seen limited investigation in image classification. To overcome these challenges, we introduce Class Vectors, which capture class-specific representation adjustments during fine-tuning. Whereas task vectors encode task-level changes in weight space, Class Vectors disentangle each class's adaptation in the latent space. We show that Class Vectors capture each class's semantic shift and that classifier editing can be achieved either by steering latent features along these vectors or by mapping them into weight space to update the decision boundaries. We also demonstrate that the inherent linearity and orthogonality of Class Vectors support efficient, flexible, and high-level concept editing via simple class arithmetic. Finally, we validate their utility in applications such as unlearning, environmental adaptation, adversarial defense, and adversarial trigger optimization.
翻译:图像分类器在医学影像疾病检测和制造过程异常识别中发挥着关键作用。然而,经过大量训练后其预定义行为使得事后模型编辑变得困难,特别是在需要遗忘特定类别或适应分布偏移时。现有的分类器编辑方法要么局限于纠正错误,要么需要大量重训练成本,这为灵活编辑造成了瓶颈。此外,此类编辑在图像分类领域的研究仍较为有限。为克服这些挑战,我们引入了类别向量,该向量在微调过程中捕获类别特定的表示调整。任务向量编码权重空间中的任务级变化,而类别向量则在潜在空间中解耦每个类别的适应过程。我们证明类别向量能够捕获每个类别的语义偏移,并且分类器编辑可以通过沿这些向量引导潜在特征实现,或通过将其映射到权重空间以更新决策边界来完成。我们还证明了类别向量固有的线性与正交性支持通过简单的类别算术实现高效、灵活的高层概念编辑。最后,我们在遗忘学习、环境适应、对抗防御和对抗触发器优化等应用中验证了其有效性。