Voice anonymization techniques have been found to successfully obscure a speaker's acoustic identity in short, isolated utterances in benchmarks such as the VoicePrivacy Challenge. In practice, however, utterances seldom occur in isolation: long-form audio is commonplace in domains such as interviews, phone calls, and meetings. In these cases, many utterances from the same speaker are available, which pose a significantly greater privacy risk: given multiple utterances from the same speaker, an attacker could exploit an individual's vocabulary, syntax, and turns of phrase to re-identify them, even when their voice is completely disguised. To address this risk, we propose new content anonymization approaches. Our approach performs a contextual rewriting of the transcripts in an ASR-TTS pipeline to eliminate speaker-specific style while preserving meaning. We present results in a long-form telephone conversation setting demonstrating the effectiveness of a content-based attack on voice-anonymized speech. Then we show how the proposed content-based anonymization methods can mitigate this risk while preserving speech utility. Overall, we find that paraphrasing is an effective defense against content-based attacks and recommend that stakeholders adopt this step to ensure anonymity in long-form audio.
翻译:语音匿名化技术已被证实能在诸如VoicePrivacy挑战赛等基准测试中,成功隐藏说话人在简短孤立话语中的声学身份。然而在实际应用中,话语很少孤立出现:长时音频在访谈、电话通话和会议等领域十分普遍。此类场景中,同一说话人的大量话语会构成显著更高的隐私风险:攻击者可能利用个体的用词习惯、句法结构和表达方式,在语音被完全伪装的情况下仍实现说话人重识别。为应对此风险,我们提出了新的内容匿名化方法。该方法通过ASR-TTS流程对转录文本进行上下文重写,在保持语义的同时消除说话人特异性风格。我们在长时电话对话场景中的实验表明:基于内容的攻击能有效破解仅进行语音匿名化的音频;而所提出的内容匿名化方法可在保持语音实用性的同时缓解此类风险。总体而言,我们发现文本复述能有效防御基于内容的攻击,建议相关方采用此步骤以确保长时音频的匿名性。