Deep Learning has advanced Automatic Speaker Verification (ASV) in the past few years. Although it is known that deep learning-based ASV systems are vulnerable to adversarial examples in digital access, there are few studies on adversarial attacks in the context of physical access, where a replay process (i.e., over the air) is involved. An over-the-air attack involves a loudspeaker, a microphone, and a replaying environment that impacts the movement of the sound wave. Our initial experiment confirms that the replay process impacts the effectiveness of the over-the-air attack performance. This study performs an initial investigation towards utilizing a neural replay simulator to improve over-the-air adversarial attack robustness. This is achieved by using a neural waveform synthesizer to simulate the replay process when estimating the adversarial perturbations. Experiments conducted on the ASVspoof2019 dataset confirm that the neural replay simulator can considerably increase the success rates of over-the-air adversarial attacks. This raises the concern for adversarial attacks on speaker verification in physical access applications.
翻译:深度学习推动了自动说话人验证(ASV)在过去几年的发展。尽管已知基于深度学习的ASV系统在数字访问场景中易受对抗样本攻击,但在涉及重放过程(即空中传输)的物理访问场景下,关于对抗攻击的研究仍十分有限。空中攻击涉及扬声器、麦克风以及影响声波传播的重放环境。初步实验证实,重放过程会影响空中攻击的有效性。本研究旨在初步探索利用神经重放模拟器提升空中对抗攻击的鲁棒性,具体通过使用神经波形合成器在估计对抗扰动时模拟重放过程来实现。基于ASVspoof2019数据集的实验证实,神经重放模拟器能显著提高空中对抗攻击的成功率。这引发了人们对物理访问应用中说话人验证系统受对抗攻击威胁的担忧。