The success of deep learning in speaker recognition relies heavily on the use of large datasets. However, the data-hungry nature of deep learning methods has already being questioned on account the ethical, privacy, and legal concerns that arise when using large-scale datasets of natural speech collected from real human speakers. For example, the widely-used VoxCeleb2 dataset for speaker recognition is no longer accessible from the official website. To mitigate these concerns, this work presents an initiative to generate a privacy-friendly synthetic VoxCeleb2 dataset that ensures the quality of the generated speech in terms of privacy, utility, and fairness. We also discuss the challenges of using synthetic data for the downstream task of speaker verification.
翻译:深度学习在说话人识别领域的成功在很大程度上依赖于大规模数据集的使用。然而,深度学习方法对数据的渴求已因其在收集真实人类说话者的自然语音大规模数据集时引发的伦理、隐私及法律问题而受到质疑。例如,广泛用于说话人识别的VoxCeleb2数据集已无法从官方网站获取。为缓解这些问题,本研究提出一项倡议,旨在生成一个隐私友好的合成VoxCeleb2数据集,确保所生成语音在隐私性、实用性和公平性方面的质量。此外,我们还讨论了将合成数据用于说话人验证下游任务时所面临的挑战。