This paper presents a novel reconstruction method that leverages Diffusion Models to protect machine learning classifiers against adversarial attacks, all without requiring any modifications to the classifiers themselves. The susceptibility of machine learning models to minor input perturbations renders them vulnerable to adversarial attacks. While diffusion-based methods are typically disregarded for adversarial defense due to their slow reverse process, this paper demonstrates that our proposed method offers robustness against adversarial threats while preserving clean accuracy, speed, and plug-and-play compatibility. Code at: https://github.com/HondamunigePrasannaSilva/DiffDefence.
翻译:本文提出了一种新颖的重建方法,利用扩散模型保护机器学习分类器免受对抗攻击,且无需对分类器本身进行任何修改。机器学习模型对微小输入扰动的敏感性使其易受对抗攻击。尽管基于扩散的方法因其缓慢的逆向过程通常被认为不适用于对抗防御,但本文证明,我们提出的方法在抵御对抗威胁的同时,仍能保持干净精度、速度以及即插即用的兼容性。代码见:https://github.com/HondamunigePrasannaSilva/DiffDefence。