In the past few years, it has been shown that deep learning systems are highly vulnerable under attacks with adversarial examples. Neural-network-based automatic speech recognition (ASR) systems are no exception. Targeted and untargeted attacks can modify an audio input signal in such a way that humans still recognise the same words, while ASR systems are steered to predict a different transcription. In this paper, we propose a defense mechanism against targeted adversarial attacks consisting in removing fast-changing features from the audio signals, either by applying slow feature analysis, a low-pass filter, or both, before feeding the input to the ASR system. We perform an empirical analysis of hybrid ASR models trained on data pre-processed in such a way. While the resulting models perform quite well on benign data, they are significantly more robust against targeted adversarial attacks: Our final, proposed model shows a performance on clean data similar to the baseline model, while being more than four times more robust.
翻译:近年来研究表明,深度学习系统在面对对抗样本攻击时具有高度脆弱性。基于神经网络的自动语音识别系统亦不例外。无论目标攻击还是非目标攻击,均能在人类听觉仍可辨识原词汇的情况下,通过修改音频输入信号,使ASR系统输出完全不同的识别结果。本文提出一种针对目标对抗攻击的防御机制:在音频信号输入ASR系统前,通过应用慢特征分析、低通滤波或两者结合的方式,移除信号中的快速变化特征。我们对基于此类预处理数据训练的混合ASR模型进行了实证分析。结果表明:所得模型在正常数据上表现良好,且对目标对抗攻击展现出显著更强的鲁棒性——我们最终提出的模型在纯净数据上的性能与基线模型相当,而其抗攻击鲁棒性提升超过四倍。