A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition

Data augmentation (DA) has played a pivotal role in the success of deep speaker recognition. Current DA techniques primarily focus on speaker-preserving augmentation, which does not change the speaker trait of the speech and does not create new speakers. Recent research has shed light on the potential of speaker augmentation, which generates new speakers to enrich the training dataset. In this study, we delve into two speaker augmentation approaches: speed perturbation (SP) and vocal tract length perturbation (VTLP). Despite the empirical utilization of both methods, a comprehensive investigation into their efficacy is lacking. Our study, conducted using two public datasets, VoxCeleb and CN-Celeb, revealed that both SP and VTLP are proficient at generating new speakers, leading to significant performance improvements in speaker recognition. Furthermore, they exhibit distinct properties in sensitivity to perturbation factors and data complexity, hinting at the potential benefits of their fusion. Our research underscores the substantial potential of speaker augmentation, highlighting the importance of in-depth exploration and analysis.

翻译：数据增强在深度说话人识别的成功中发挥了关键作用。当前的数据增强技术主要聚焦于保持说话人身份的增强方法，这类方法不改变语音中的说话人特征，也不会产生新的说话人。近期研究揭示了说话人增强技术的潜力，该方法通过生成新的说话人来丰富训练数据集。在本研究中，我们深入探讨了两种说话人增强方法：速度扰动和声道长度扰动。尽管这两种方法已有经验性应用，但对其效能仍缺乏系统性研究。我们基于VoxCeleb和CN-Celeb两个公开数据集开展的实验表明，SP和VTLP均能有效生成新说话人，从而显著提升说话人识别性能。此外，两者在扰动因子敏感性和数据复杂度方面展现出不同特性，暗示其融合可能具有潜在优势。本研究证实了说话人增强技术的巨大潜力，并强调了深入探索与分析的重要性。

相关内容

声纹识别

关注 444

说话人识别（Speaker Recognition），或者称为声纹识别（Voiceprint Recognition, VPR），是根据语音中所包含的说话人个性信息，利用计算机以及现在的信息识别技术，自动鉴别说话人身份的一种生物特征识别技术。说话人识别研究的目的就是从语音中提取具有说话人表征性的特征，建立有效的模型和系统，实现自动精准的说话人鉴别。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

14+阅读 · 2022年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日