Deep learning models have been widely used in commercial acoustic systems in recent years. However, adversarial audio examples can cause abnormal behaviors for those acoustic systems, while being hard for humans to perceive. Various methods, such as transformation-based defenses and adversarial training, have been proposed to protect acoustic systems from adversarial attacks, but they are less effective against adaptive attacks. Furthermore, directly applying the methods from the image domain can lead to suboptimal results because of the unique properties of audio data. In this paper, we propose an adversarial purification-based defense pipeline, AudioPure, for acoustic systems via off-the-shelf diffusion models. Taking advantage of the strong generation ability of diffusion models, AudioPure first adds a small amount of noise to the adversarial audio and then runs the reverse sampling step to purify the noisy audio and recover clean audio. AudioPure is a plug-and-play method that can be directly applied to any pretrained classifier without any fine-tuning or re-training. We conduct extensive experiments on speech command recognition task to evaluate the robustness of AudioPure. Our method is effective against diverse adversarial attacks (e.g. $\mathcal{L}_2$ or $\mathcal{L}_\infty$-norm). It outperforms the existing methods under both strong adaptive white-box and black-box attacks bounded by $\mathcal{L}_2$ or $\mathcal{L}_\infty$-norm (up to +20\% in robust accuracy). Besides, we also evaluate the certified robustness for perturbations bounded by $\mathcal{L}_2$-norm via randomized smoothing. Our pipeline achieves a higher certified accuracy than baselines.
翻译:深度学习模型近年来已广泛应用于商业声学系统。然而,对抗性音频示例可能导致这些声学系统出现异常行为,同时难以被人类感知。基于变换的防御和对抗训练等方法已被提出用于保护声学系统免受对抗攻击,但它们在应对自适应攻击时效果欠佳。此外,由于音频数据的独特属性,直接应用图像领域的防御方法可能产生次优结果。本文提出一种基于对抗净化的防御框架AudioPure,该框架利用现成的扩散模型保护声学系统。借助扩散模型强大的生成能力,AudioPure首先向对抗性音频添加少量噪声,随后通过逆采样步骤净化含噪音频并恢复纯净音频。作为一种即插即用方法,AudioPure可直接应用于任意预训练分类器,无需微调或重新训练。我们在语音命令识别任务上进行了广泛实验以评估AudioPure的鲁棒性。该方法能有效应对多种对抗攻击(如$\mathcal{L}_2$或$\mathcal{L}_\infty$范数攻击),在强自适应白盒和黑盒攻击下均优于现有方法(鲁棒准确率提升高达+20%)。此外,我们还通过随机平滑方法评估了$\mathcal{L}_2$范数扰动下的可认证鲁棒性,所提框架在认证准确率上显著超越基准方法。