We present Synergy Aware Forgetting Ensemble (SAFE), a method to adapt large models on a diverse collection of data while minimizing the expected cost to remove the influence of training samples from the trained model. This process, also known as selective forgetting or unlearning, is often conducted by partitioning a dataset into shards, training fully independent models on each, then ensembling the resulting models. Increasing the number of shards reduces the expected cost to forget but at the same time it increases inference cost and reduces the final accuracy of the model since synergistic information between samples is lost during the independent model training. Rather than treating each shard as independent, SAFE introduces the notion of a shard graph, which allows incorporating limited information from other shards during training, trading off a modest increase in expected forgetting cost with a significant increase in accuracy, all while still attaining complete removal of residual influence after forgetting. SAFE uses a lightweight system of adapters which can be trained while reusing most of the computations. This allows SAFE to be trained on shards an order-of-magnitude smaller than current state-of-the-art methods (thus reducing the forgetting costs) while also maintaining high accuracy, as we demonstrate empirically on fine-grained computer vision datasets.
翻译:我们提出协同感知遗忘集成(SAFE)方法,该方法能在多样化数据集上适配大模型,同时最小化移除训练样本对模型影响的预期成本。这一过程(也称为选择性遗忘或机器遗忘)通常通过将数据集划分为多个分片(shard)、在每个分片上独立训练完整模型,然后集成这些结果模型来实现。增加分片数量会降低遗忘的预期成本,但同时会增加推理成本并降低模型最终精度,因为独立模型训练过程中样本间的协同信息会丢失。SAFE并非将各分片视为独立个体,而是引入分片图(shard graph)概念,允许在训练时从其他分片获取有限信息,从而在适度增加预期遗忘成本与显著提升精度之间取得平衡,同时仍能实现遗忘后残留影响的完全消除。SAFE采用轻量级适配器系统,可在重复利用大部分计算资源的前提下完成训练。这使得SAFE能在比分片小一个数量级的数据集(相较于当前最优方法)上进行训练,从而降低遗忘成本,同时保持高精度——我们在细粒度计算机视觉数据集上的实证结果验证了这一点。