With the ubiquity of smart devices that use speaker recognition (SR) systems as a means of authenticating individuals and personalizing their services, fairness of SR systems has becomes an important point of focus. In this paper we study the notion of fairness in recent SR systems based on 3 popular and relevant definitions, namely Statistical Parity, Equalized Odds, and Equal Opportunity. We examine 5 popular neural architectures and 5 commonly used loss functions in training SR systems, while evaluating their fairness against gender and nationality groups. Our detailed experiments shed light on this concept and demonstrate that more sophisticated encoder architectures better align with the definitions of fairness. Additionally, we find that the choice of loss functions can significantly impact the bias of SR models.
翻译:随着使用说话人识别(SR)系统作为个人身份验证及服务个性化手段的智能设备日益普及,SR系统的公平性已成为重要关注点。本文基于统计均等、均等化几率与均等机会三种主流且相关的公平性定义,系统研究了近期SR系统的公平性概念。我们考察了五种主流神经网络架构及五种训练SR系统的常用损失函数,并评估了它们针对性别与国籍群体的公平性表现。详细实验揭示了这一概念的内在规律,表明更精细的编码器架构更能契合公平性定义。此外,我们发现损失函数的选择会显著影响SR模型的偏见程度。