Benchmark Dataset Dynamics, Bias and Privacy Challenges in Voice Biometrics Research

Speaker recognition is a widely used voice-based biometric technology with applications in various industries, including banking, education, recruitment, immigration, law enforcement, healthcare, and well-being. However, while dataset evaluations and audits have improved data practices in face recognition and other computer vision tasks, the data practices in speaker recognition have gone largely unquestioned. Our research aims to address this gap by exploring how dataset usage has evolved over time and what implications this has on bias, fairness and privacy in speaker recognition systems. Previous studies have demonstrated the presence of historical, representation, and measurement biases in popular speaker recognition benchmarks. In this paper, we present a longitudinal study of speaker recognition datasets used for training and evaluation from 2012 to 2021. We survey close to 700 papers to investigate community adoption of datasets and changes in usage over a crucial time period where speaker recognition approaches transitioned to the widespread adoption of deep neural networks. Our study identifies the most commonly used datasets in the field, examines their usage patterns, and assesses their attributes that affect bias, fairness, and other ethical concerns. Our findings suggest areas for further research on the ethics and fairness of speaker recognition technology.

翻译：说话人识别是一种广泛应用的基于语音的生物特征识别技术，在银行、教育、招聘、移民、执法、医疗健康和福祉等多个行业均有实践。然而，尽管数据集评估与审计已改善了人脸识别及其他计算机视觉任务中的数据实践，说话人识别领域的数据实践却长期未受审视。本研究旨在填补这一空白，通过探索数据集使用方式的演变及其对说话人识别系统的偏差、公平性和隐私的影响展开分析。已有研究表明，主流说话人识别基准中存在历史偏差、表征偏差和测量偏差。本文对2012年至2021年间用于训练和评估的说话人识别数据集进行了纵向研究，调查了近700篇论文，以揭示社区对数据集的采纳情况以及在此期间（即说话人识别方法向深度神经网络广泛过渡的关键时期）使用模式的变化。本研究识别了该领域最常用的数据集，分析了其使用模式，并评估了影响偏差、公平性及其他伦理问题的数据集属性。研究结果为后续关于说话人识别技术伦理与公平性的探讨提供了研究方向。

相关内容

声纹识别

关注 444

说话人识别（Speaker Recognition），或者称为声纹识别（Voiceprint Recognition, VPR），是根据语音中所包含的说话人个性信息，利用计算机以及现在的信息识别技术，自动鉴别说话人身份的一种生物特征识别技术。说话人识别研究的目的就是从语音中提取具有说话人表征性的特征，建立有效的模型和系统，实现自动精准的说话人鉴别。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日