Voice anonymization has been developed as a technique for preserving privacy by replacing the speaker's voice in a speech signal with that of a pseudo-speaker, thereby obscuring the original voice attributes from machine recognition and human perception. In this paper, we focus on altering the voice attributes against machine recognition while retaining human perception. We referred to this as the asynchronous voice anonymization. To this end, a speech generation framework incorporating a speaker disentanglement mechanism is employed to generate the anonymized speech. The speaker attributes are altered through adversarial perturbation applied on the speaker embedding, while human perception is preserved by controlling the intensity of perturbation. Experiments conducted on the LibriSpeech dataset showed that the speaker attributes were obscured with their human perception preserved for 60.71% of the processed utterances.
翻译:语音匿名化作为一种隐私保护技术,其核心在于将语音信号中的说话人声音替换为伪说话人声音,从而在机器识别与人类感知层面同时掩盖原始语音属性。本文聚焦于在保持人类感知特性的前提下,针对机器识别系统改变语音属性,我们将此称为异步语音匿名化。为实现该目标,我们采用融合说话人解耦机制的语音生成框架来生成匿名化语音。通过向说话人嵌入施加对抗性扰动以改变说话人属性,同时通过控制扰动强度保持人类感知特性。在LibriSpeech数据集上的实验表明,经处理的语音中有60.71%的语句在保持人类感知特性的同时成功掩盖了说话人属性。