In this paper, we investigate the impact of speech temporal dynamics in application to automatic speaker verification and speaker voice anonymization tasks. We propose several metrics to perform automatic speaker verification based only on phoneme durations. Experimental results demonstrate that phoneme durations leak some speaker information and can reveal speaker identity from both original and anonymized speech. Thus, this work emphasizes the importance of taking into account the speaker's speech rate and, more importantly, the speaker's phonetic duration characteristics, as well as the need to modify them in order to develop anonymization systems with strong privacy protection capacity.
翻译:本文研究了语音时序动态在自动说话人验证和说话人语音匿名化任务中的应用影响。我们提出了几种仅基于音素时长的自动说话人验证指标。实验结果表明,音素时长会泄露部分说话人信息,并能从原始语音和匿名化语音中揭示说话人身份。因此,本研究强调了考虑说话人语速的重要性,更重要的是需要考虑说话人的音素时长特征,以及修改这些特征对于开发具有强大隐私保护能力的匿名化系统的必要性。