Unlearnable examples (UEs) seek to maximize testing error by making subtle modifications to training examples that are correctly labeled. Defenses against these poisoning attacks can be categorized based on whether specific interventions are adopted during training. The first approach is training-time defense, such as adversarial training, which can mitigate poisoning effects but is computationally intensive. The other approach is pre-training purification, e.g., image short squeezing, which consists of several simple compressions but often encounters challenges in dealing with various UEs. Our work provides a novel disentanglement mechanism to build an efficient pre-training purification method. Firstly, we uncover rate-constrained variational autoencoders (VAEs), demonstrating a clear tendency to suppress the perturbations in UEs. We subsequently conduct a theoretical analysis for this phenomenon. Building upon these insights, we introduce a disentangle variational autoencoder (D-VAE), capable of disentangling the perturbations with learnable class-wise embeddings. Based on this network, a two-stage purification approach is naturally developed. The first stage focuses on roughly eliminating perturbations, while the second stage produces refined, poison-free results, ensuring effectiveness and robustness across various scenarios. Extensive experiments demonstrate the remarkable performance of our method across CIFAR-10, CIFAR-100, and a 100-class ImageNet-subset. Code is available at https://github.com/yuyi-sd/D-VAE.
翻译:不可学习样本旨在通过对正确标注的训练样本进行微妙修改,使测试误差最大化。针对此类投毒攻击的防御方法可根据训练过程中是否采用特定干预措施进行分类。第一种方法是训练时防御(如对抗训练),这类方法能缓解投毒影响但计算开销较大;第二种方法是预训练净化(如图像短压缩),包含若干简单压缩操作但常难以应对各类不可学习样本。本研究提出一种新颖的解耦机制,构建高效的预训练净化方法。首先,我们揭示了率约束变分自编码器对不可学习样本中的扰动具有明显抑制倾向,并对此现象进行了理论分析。基于这一发现,我们提出解耦变分自编码器,能够通过可学习的类别嵌入解耦扰动。基于该网络架构,自然形成了两阶段净化方案:第一阶段着重消除粗粒度扰动,第二阶段则生成精炼的无毒结果,确保在多种场景下的有效性和鲁棒性。大量实验表明,本方法在CIFAR-10、CIFAR-100以及含100个类别的ImageNet子集上均表现卓越。代码已开源至https://github.com/yuyi-sd/D-VAE。