The rapid progress in personalized speech generation technology, including personalized text-to-speech (TTS) and voice conversion (VC), poses a challenge in distinguishing between generated and real speech for human listeners, resulting in an urgent demand in protecting speakers' voices from malicious misuse. In this regard, we propose a speaker protection method based on adversarial attacks. The proposed method perturbs speech signals by minimally altering the original speech while rendering downstream speech generation models unable to accurately generate the voice of the target speaker. For validation, we employ the open-source pre-trained YourTTS model for speech generation and protect the target speaker's speech in the white-box scenario. Automatic speaker verification (ASV) evaluations were carried out on the generated speech as the assessment of the voice protection capability. Our experimental results show that we successfully perturbed the speaker encoder of the YourTTS model using the gradient-based I-FGSM adversarial perturbation method. Furthermore, the adversarial perturbation is effective in preventing the YourTTS model from generating the speech of the target speaker. Audio samples can be found in https://voiceprivacy.github.io/Adeversarial-Speech-with-YourTTS.
翻译:个性化语音生成技术(包括个性化文本转语音与语音转换)的快速发展,使得人类听者难以区分生成语音与真实语音,从而迫切需要保护说话者语音免受恶意滥用。针对这一问题,我们提出了一种基于对抗攻击的说话人保护方法。该方法通过最小限度地改变原始语音来扰动语音信号,同时使下游语音生成模型无法准确生成目标说话人的声音。为验证效果,我们采用开源预训练YourTTS模型进行语音生成,并在白盒场景下对目标说话人语音实施保护。通过自动说话人验证(ASV)指标对生成语音进行评估,以衡量语音保护能力。实验结果表明,我们成功利用基于梯度的I-FGSM对抗扰动方法干扰了YourTTS模型的说话人编码器。此外,该对抗扰动可有效阻止YourTTS模型生成目标说话人的语音。音频样本请见 https://voiceprivacy.github.io/Adeversarial-Speech-with-YourTTS。