VSMask: Defending Against Voice Synthesis Attack via Real-Time Predictive Perturbation

Deep learning based voice synthesis technology generates artificial human-like speeches, which has been used in deepfakes or identity theft attacks. Existing defense mechanisms inject subtle adversarial perturbations into the raw speech audios to mislead the voice synthesis models. However, optimizing the adversarial perturbation not only consumes substantial computation time, but it also requires the availability of entire speech. Therefore, they are not suitable for protecting live speech streams, such as voice messages or online meetings. In this paper, we propose VSMask, a real-time protection mechanism against voice synthesis attacks. Different from offline protection schemes, VSMask leverages a predictive neural network to forecast the most effective perturbation for the upcoming streaming speech. VSMask introduces a universal perturbation tailored for arbitrary speech input to shield a real-time speech in its entirety. To minimize the audio distortion within the protected speech, we implement a weight-based perturbation constraint to reduce the perceptibility of the added perturbation. We comprehensively evaluate VSMask protection performance under different scenarios. The experimental results indicate that VSMask can effectively defend against 3 popular voice synthesis models. None of the synthetic voice could deceive the speaker verification models or human ears with VSMask protection. In a physical world experiment, we demonstrate that VSMask successfully safeguards the real-time speech by injecting the perturbation over the air.

翻译：基于深度学习的语音合成技术能够生成逼真的人工语音，已被用于深度伪造或身份盗窃攻击中。现有防御机制通过向原始语音音频注入微小的对抗性扰动来误导语音合成模型。然而，优化对抗性扰动不仅消耗大量计算时间，还要求获取完整语音信号。因此，这些方法不适用于保护实时语音流，例如语音消息或在线会议。本文提出VSMask，一种针对语音合成攻击的实时保护机制。与离线保护方案不同，VSMask利用预测神经网络为即将到来的流式语音预测最优扰动。VSMask引入了一种通用扰动，可针对任意语音输入实现对整个实时语音的全方位保护。为最小化受保护语音中的音频失真，我们实现了基于权重的扰动约束，以降低所添扰动的可感知性。我们在不同场景下全面评估了VSMask的保护性能。实验结果表明，VSMask能有效防御3种主流语音合成模型。在VSMask保护下，任何合成语音均无法欺骗说话人验证模型或人类听觉。在物理世界实验中，我们证明VSMask可通过空中注入扰动成功保护实时语音。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

52+阅读 · 2022年10月22日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

47+阅读 · 2020年10月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日