Current adversarial attacks against speaker recognition systems (SRSs) require either white-box access or heavy black-box queries to the target SRS, thus still falling behind practical attacks against proprietary commercial APIs and voice-controlled devices. To fill this gap, we propose QFA2SR, an effective and imperceptible query-free black-box attack, by leveraging the transferability of adversarial voices. To improve transferability, we present three novel methods, tailored loss functions, SRS ensemble, and time-freq corrosion. The first one tailors loss functions to different attack scenarios. The latter two augment surrogate SRSs in two different ways. SRS ensemble combines diverse surrogate SRSs with new strategies, amenable to the unique scoring characteristics of SRSs. Time-freq corrosion augments surrogate SRSs by incorporating well-designed time-/frequency-domain modification functions, which simulate and approximate the decision boundary of the target SRS and distortions introduced during over-the-air attacks. QFA2SR boosts the targeted transferability by 20.9%-70.7% on four popular commercial APIs (Microsoft Azure, iFlytek, Jingdong, and TalentedSoft), significantly outperforming existing attacks in query-free setting, with negligible effect on the imperceptibility. QFA2SR is also highly effective when launched over the air against three wide-spread voice assistants (Google Assistant, Apple Siri, and TMall Genie) with 60%, 46%, and 70% targeted transferability, respectively.
翻译:当前针对说话人识别系统(SRS)的对抗攻击要么需要白盒访问,要么需要对目标SRS进行大量黑盒查询,因此仍无法有效攻击商业专有API和语音控制设备。为填补这一空白,我们提出QFA2SR——一种利用对抗语音可迁移性的高效、隐蔽的免查询黑盒攻击。为提升迁移性,我们提出三种创新方法:定制损失函数、SRS集成以及时频腐蚀。第一种方法针对不同攻击场景定制损失函数,后两种则通过不同方式增强替代SRS。SRS集成采用新策略组合多样化的替代SRS,适配SRS独特的评分特性。时频腐蚀通过引入精心设计的时域/频域修改函数来增强替代SRS,模拟并近似目标SRS的决策边界及空中攻击引入的失真。在四种主流商业API(微软Azure、科大讯飞、京东、乐智)上,QFA2SR将目标迁移性提升20.9%-70.7%,显著优于现有免查询攻击,同时几乎不影响隐蔽性。针对三大广泛使用的语音助手(Google Assistant、Apple Siri、天猫精灵)在空中发起攻击时,QFA2SR分别达到60%、46%和70%的目标迁移性。