Privacy-preserving voice conversion aims to remove only the attributes of speech audio that convey identity information, keeping other speech characteristics intact. This paper presents a mechanism for privacy-preserving voice conversion that allows controlling the leakage of identity-bearing information using adversarial information hiding. This enables a deliberate trade-off between maintaining source-speech characteristics and modification of speaker identity. As such, the approach improves on voice-conversion techniques like CycleGAN and StarGAN, which were not designed for privacy, meaning that converted speech may leak personal information in unpredictable ways. Our approach is also more flexible than ASR-TTS voice conversion pipelines, which by design discard all prosodic information linked to textual content. Evaluations show that the proposed system successfully modifies perceived speaker identity whilst well maintaining source lexical content.
翻译:隐私保护语音转换旨在仅去除语音音频中传达身份信息的属性,同时保持其他语音特征不变。本文提出一种隐私保护语音转换机制,通过对抗信息隐藏技术控制身份信息的泄露程度。该方法实现了在保持源语音特征与修改说话人身份之间的可控权衡。相较于CycleGAN和StarGAN等非为隐私保护设计的语音转换技术(其转换后的语音可能以不可预测的方式泄露个人信息),本方法具有显著改进。同时,该方法比ASR-TTS语音转换流程更为灵活,后者在设计上会丢弃所有与文本内容相关的韵律信息。评估结果表明,所提系统在成功改变感知说话人身份的同时,能较好地保持源语音的词汇内容。