Modern Voice Control Systems (VCS) rely on the collaboration of Automatic Speech Recognition (ASR) and Speaker Recognition (SR) for secure interaction. However, prior adversarial attacks typically target these tasks in isolation, overlooking the coupled decision pipeline in real-world scenarios. Consequently, single-task attacks often fail to pose a practical threat. To fill this gap, we first utilize gradient analysis to reveal that ASR and SR exhibit no inherent conflicts. Building on this, we propose Dual-task Universal Adversarial Perturbation (DUAP). Specifically, DUAP employs a targeted surrogate objective to effectively disrupt ASR transcription and introduces a Dynamic Normalized Ensemble (DNE) strategy to enhance transferability across diverse SR models. Furthermore, we incorporate psychoacoustic masking to ensure perturbation imperceptibility. Extensive evaluations across five ASR and six SR models demonstrate that DUAP achieves high simultaneous attack success rates and superior imperceptibility, significantly outperforming existing single-task baselines.
翻译:现代语音控制系统依赖于自动语音识别与说话人识别的协同工作以实现安全交互。然而,现有的对抗攻击通常孤立地针对其中一项任务,忽略了实际场景中耦合的决策流程。因此,单任务攻击往往无法构成实际威胁。为填补这一空白,我们首先通过梯度分析揭示了ASR与SR之间不存在固有冲突。基于此,我们提出了双任务通用对抗扰动方法。具体而言,DUAP采用定向代理目标以有效干扰ASR转录,并引入动态归一化集成策略来增强跨不同SR模型的可迁移性。此外,我们结合心理声学掩蔽技术来确保扰动的不可感知性。在五个ASR模型和六个SR模型上的大量实验表明,DUAP能实现较高的同步攻击成功率与卓越的不可感知性,显著优于现有的单任务基线方法。