As experts in voice modification, trans-feminine gender-affirming voice teachers have unique perspectives on voice that confound current understandings of speaker identity. To demonstrate this, we present the Versatile Voice Dataset (VVD), a collection of three speakers modifying their voices along gendered axes. The VVD illustrates that current approaches in speaker modeling, based on categorical notions of gender and a static understanding of vocal texture, fail to account for the flexibility of the vocal tract. Utilizing publicly-available speaker embeddings, we demonstrate that gender classification systems are highly sensitive to voice modification, and speaker verification systems fail to identify voices as coming from the same speaker as voice modification becomes more drastic. As one path towards moving beyond categorical and static notions of speaker identity, we propose modeling individual qualities of vocal texture such as pitch, resonance, and weight.
翻译:作为声音调整领域的专家,跨性别女性性别肯定声音训练师对声音持有独特的视角,这些视角挑战了当前对说话者身份的理解。为证明这一点,我们提出了多功能语音数据集(VVD),该数据集收录了三位说话者沿性别维度调整其语音的样本。VVD表明,当前基于性别分类概念和静态声纹理解的说话者建模方法,未能充分考虑声道的灵活性。通过使用公开可用的说话者嵌入向量,我们证明了性别分类系统对声音调整极为敏感,且当声音调整幅度增大时,说话者验证系统无法识别出同一说话者的声音。为超越分类化和静态的说话者身份概念,我们提出对声音纹理的个体特征(如音高、共鸣和音质密度)进行建模。