QFA2SR: Query-Free Adversarial Transfer Attacks to Speaker Recognition Systems

Current adversarial attacks against speaker recognition systems (SRSs) require either white-box access or heavy black-box queries to the target SRS, thus still falling behind practical attacks against proprietary commercial APIs and voice-controlled devices. To fill this gap, we propose QFA2SR, an effective and imperceptible query-free black-box attack, by leveraging the transferability of adversarial voices. To improve transferability, we present three novel methods, tailored loss functions, SRS ensemble, and time-freq corrosion. The first one tailors loss functions to different attack scenarios. The latter two augment surrogate SRSs in two different ways. SRS ensemble combines diverse surrogate SRSs with new strategies, amenable to the unique scoring characteristics of SRSs. Time-freq corrosion augments surrogate SRSs by incorporating well-designed time-/frequency-domain modification functions, which simulate and approximate the decision boundary of the target SRS and distortions introduced during over-the-air attacks. QFA2SR boosts the targeted transferability by 20.9%-70.7% on four popular commercial APIs (Microsoft Azure, iFlytek, Jingdong, and TalentedSoft), significantly outperforming existing attacks in query-free setting, with negligible effect on the imperceptibility. QFA2SR is also highly effective when launched over the air against three wide-spread voice assistants (Google Assistant, Apple Siri, and TMall Genie) with 60%, 46%, and 70% targeted transferability, respectively.

翻译：当前针对说话人识别系统（SRS）的对抗攻击要么需要白盒访问，要么需要对目标SRS进行大量黑盒查询，因此仍无法有效攻击商业专有API和语音控制设备。为填补这一空白，我们提出QFA2SR——一种利用对抗语音可迁移性的高效、隐蔽的免查询黑盒攻击。为提升迁移性，我们提出三种创新方法：定制损失函数、SRS集成以及时频腐蚀。第一种方法针对不同攻击场景定制损失函数，后两种则通过不同方式增强替代SRS。SRS集成采用新策略组合多样化的替代SRS，适配SRS独特的评分特性。时频腐蚀通过引入精心设计的时域/频域修改函数来增强替代SRS，模拟并近似目标SRS的决策边界及空中攻击引入的失真。在四种主流商业API（微软Azure、科大讯飞、京东、乐智）上，QFA2SR将目标迁移性提升20.9%-70.7%，显著优于现有免查询攻击，同时几乎不影响隐蔽性。针对三大广泛使用的语音助手（Google Assistant、Apple Siri、天猫精灵）在空中发起攻击时，QFA2SR分别达到60%、46%和70%的目标迁移性。

相关内容

声纹识别

关注 444

说话人识别（Speaker Recognition），或者称为声纹识别（Voiceprint Recognition, VPR），是根据语音中所包含的说话人个性信息，利用计算机以及现在的信息识别技术，自动鉴别说话人身份的一种生物特征识别技术。说话人识别研究的目的就是从语音中提取具有说话人表征性的特征，建立有效的模型和系统，实现自动精准的说话人鉴别。

【CVPR 2022】可转移的稀疏对抗性攻击，Transferable Sparse Adversarial Attack

专知会员服务

15+阅读 · 2022年3月12日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

近期必读的六篇AAAI 2021【对抗攻击（Adversarial Attack）】相关论文和代码

专知会员服务

55+阅读 · 2021年2月17日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日