We develop DMAVFL, a novel attack strategy that evades current detection mechanisms. The key idea is to integrate a discriminator with auxiliary classifier that takes a full advantage of the label information (which was completely ignored in previous attacks): on one hand, label information helps to better characterize embeddings of samples from distinct classes, yielding an improved reconstruction performance; on the other hand, computing malicious gradients with label information better mimics the honest training, making the malicious gradients indistinguishable from the honest ones, and the attack much more stealthy. Our comprehensive experiments demonstrate that DMAVFL significantly outperforms existing attacks, and successfully circumvents SOTA defenses for malicious attacks. Additional ablation studies and evaluations on other defenses further underscore the robustness and effectiveness of DMAVFL.
翻译:我们提出了一种新型攻击策略DMAVFL,该策略能够规避当前检测机制。核心思想在于集成一个带有辅助分类器的判别器,充分利用之前攻击完全忽略的标签信息:一方面,标签信息有助于更好地区分不同类别样本的嵌入表示,从而提升重建性能;另一方面,利用标签信息计算恶意梯度能更好地模拟诚实训练过程,使恶意梯度与诚实梯度难以区分,从而使攻击更具隐蔽性。全面实验表明,DMAVFL显著优于现有攻击,并成功规避了针对恶意攻击的最先进防御方法。额外的消融实验及对其他防御措施的评估进一步证实了DMAVFL的鲁棒性与有效性。