In an age of voice-enabled technology, voice anonymization offers a solution to protect people's privacy, provided these systems work equally well across subgroups. This study investigates bias in voice anonymization systems within the context of the Voice Privacy Challenge. We curate a novel benchmark dataset to assess performance disparities among speaker subgroups based on sex and dialect. We analyze the impact of three anonymization systems and attack models on speaker subgroup bias and reveal significant performance variations. Notably, subgroup bias intensifies with advanced attacker capabilities, emphasizing the challenge of achieving equal performance across all subgroups. Our study highlights the need for inclusive benchmark datasets and comprehensive evaluation strategies that address subgroup bias in voice anonymization.
翻译:在语音技术日益普及的时代,语音匿名化提供了一种保护个人隐私的解决方案,前提是这些系统能在不同子群体中同等有效运行。本研究以语音隐私挑战赛为背景,探讨了语音匿名化系统中的偏见问题。我们精心构建了一个全新的基准数据集,以评估基于性别和方言的说话人子群体间的性能差异。通过分析三种匿名化系统及攻击模型对说话人子群体偏见的影响,我们揭示了显著的性能差异。值得注意的是,随着攻击者能力的增强,子群体偏见更加严重,这凸显了在所有子群体中实现同等性能的挑战。我们的研究强调了构建包容性基准数据集和采用全面评估策略的必要性,以应对语音匿名化中的子群体偏见。