The use of modern vocoders in an analysis/synthesis pipeline allows us to investigate high-quality voice conversion that can be used for privacy purposes. Here, we propose to transform the speaker embedding and the pitch in order to hide the sex of the speaker. ECAPA-TDNN-based speaker representation fed into a HiFiGAN vocoder is protected using a neural-discriminant analysis approach, which is consistent with the zero-evidence concept of privacy. This approach significantly reduces the information in speech related to the speaker's sex while preserving speech content and some consistency in the resulting protected voices.
翻译:现代声码器在分析/合成流水线中的应用使我们能够研究可用于隐私保护的高质量语音转换。本文提出通过变换说话人嵌入向量和基频来隐藏说话人的性别。基于ECAPA-TDNN的说话人表征输入至HiFiGAN声码器,并采用神经判别分析方法进行保护,该方案与隐私保护的零证据概念一致。该方法在显著降低语音中与说话人性别相关信息的同时,保留了语音内容及保护后语音的一致性。