Deep Learning has advanced Automatic Speaker Verification (ASV) in the past few years. Although it is known that deep learning-based ASV systems are vulnerable to adversarial examples in digital access, there are few studies on adversarial attacks in the context of physical access, where a replay process (i.e., over the air) is involved. An over-the-air attack involves a loudspeaker, a microphone, and a replaying environment that impacts the movement of the sound wave. Our initial experiment confirms that the replay process impacts the effectiveness of the over-the-air attack performance. This study performs an initial investigation towards utilizing a neural replay simulator to improve over-the-air adversarial attack robustness. This is achieved by using a neural waveform synthesizer to simulate the replay process when estimating the adversarial perturbations. Experiments conducted on the ASVspoof2019 dataset confirm that the neural replay simulator can considerably increase the success rates of over-the-air adversarial attacks. This raises the concern for adversarial attacks on speaker verification in physical access applications.
翻译:深度学习在过去几年推动了自动说话人验证技术的进步。尽管已知基于深度学习的自动说话人验证系统在数字访问中易受对抗样本攻击,但在涉及重放过程(即空中传输)的物理访问场景下,关于对抗攻击的研究仍相对较少。空中攻击涉及扬声器、麦克风以及影响声波传播的重放环境。我们的初步实验证实,重放过程会影响空中攻击性能的有效性。本研究初步探索了利用神经重放模拟器提升空中对抗攻击鲁棒性的方法。具体通过使用神经波形合成器在估计对抗扰动时模拟重放过程来实现。基于ASVspoof2019数据集的实验表明,神经重放模拟器能够显著提高空中对抗攻击的成功率。这引发了对物理访问应用中说话人验证系统遭受对抗攻击的担忧。