An Investigation of Distribution Alignment in Multi-Genre Speaker Recognition

Multi-genre speaker recognition is becoming increasingly popular due to its ability to better represent the complexities of real-world applications. However, a major challenge is the significant shift in the distribution of speaker vectors across different genres. While distribution alignment is a common approach to address this challenge, previous studies have mainly focused on aligning a source domain with a target domain, and the performance of multi-genre data is unknown. This paper presents a comprehensive study of mainstream distribution alignment methods on multi-genre data, where multiple distributions need to be aligned. We analyze various methods both qualitatively and quantitatively. Our experiments on the CN-Celeb dataset show that within-between distribution alignment (WBDA) performs relatively better. However, we also found that none of the investigated methods consistently improved performance in all test cases. This suggests that solely aligning the distributions of speaker vectors may not fully address the challenges posed by multi-genre speaker recognition. Further investigation is necessary to develop a more comprehensive solution.

翻译：多体裁说话人识别因其更能反映真实应用场景的复杂性而日益受到关注。然而，主要挑战在于不同体裁间说话人向量的分布存在显著偏移。尽管分布对齐是应对这一问题的常见方法，但先前研究主要聚焦于源域与目标域的对齐，其对多体裁数据的表现尚不明朗。本文针对主流分布对齐方法在多体裁数据（需对齐多个分布）上的性能进行了全面研究。我们通过定性与定量分析相结合的方式评估了多种方法。在CN-Celeb数据集上的实验表明，体裁间-体裁内分布对齐（WBDA）表现相对更优。然而，我们也发现所研究方法均未能在所有测试场景中持续提升性能。这表明单纯对齐说话人向量分布可能无法完全解决多体裁说话人识别面临的挑战，亟需进一步研究以开发更全面的解决方案。

相关内容

声纹识别

关注 444

说话人识别（Speaker Recognition），或者称为声纹识别（Voiceprint Recognition, VPR），是根据语音中所包含的说话人个性信息，利用计算机以及现在的信息识别技术，自动鉴别说话人身份的一种生物特征识别技术。说话人识别研究的目的就是从语音中提取具有说话人表征性的特征，建立有效的模型和系统，实现自动精准的说话人鉴别。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日