LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification

Although the security of automatic speaker verification (ASV) is seriously threatened by recently emerged adversarial attacks, there have been some countermeasures to alleviate the threat. However, many defense approaches not only require the prior knowledge of the attackers but also possess weak interpretability. To address this issue, in this paper, we propose an attacker-independent and interpretable method, named learnable mask detector (LMD), to separate adversarial examples from the genuine ones. It utilizes score variation as an indicator to detect adversarial examples, where the score variation is the absolute discrepancy between the ASV scores of an original audio recording and its transformed audio synthesized from its masked complex spectrogram. A core component of the score variation detector is to generate the masked spectrogram by a neural network. The neural network needs only genuine examples for training, which makes it an attacker-independent approach. Its interpretability lies that the neural network is trained to minimize the score variation of the targeted ASV, and maximize the number of the masked spectrogram bins of the genuine training examples. Its foundation is based on the observation that, masking out the vast majority of the spectrogram bins with little speaker information will inevitably introduce a large score variation to the adversarial example, and a small score variation to the genuine example. Experimental results with 12 attackers and two representative ASV systems show that our proposed method outperforms five state-of-the-art baselines. The extensive experimental results can also be a benchmark for the detection-based ASV defenses.

翻译：虽然自动说话人验证（ASV）的安全性近期因对抗攻击的兴起而受到严重威胁，但已有若干应对措施缓解该威胁。然而，许多防御方法不仅需要攻击者的先验知识，且解释性较弱。针对此问题，本文提出一种与攻击者无关且可解释的方法——可学习掩码检测器（LMD），用于从真实样本中分离出对抗样本。该方法利用分数变化作为检测对抗样本的指标，其中分数变化是原始音频录音的ASV分数与其掩码复数语谱图合成转换音频的ASV分数之间的绝对差异。分数变化检测器的核心组件是通过神经网络生成掩码语谱图。该神经网络仅需真实样本进行训练，使其成为一种与攻击者无关的方法。其可解释性在于：网络被训练为最小化目标ASV的分数变化，同时最大化真实训练样本的掩码语谱图单元数量。该方法的理论基础在于：若掩码掉绝大多数缺乏说话人信息的语谱图单元，则必然对对抗样本引入较大分数变化，而对真实样本仅引入较小分数变化。基于12种攻击者和两个代表性ASV系统的实验结果表明，本文方法优于五种最先进基线模型。充分的实验结果亦可作为基于检测的ASV防御方法的基准。