When decisions are made and when personal data is treated by automated processes, there is an expectation of fairness -- that members of different demographic groups receive equitable treatment. This expectation applies to biometric systems such as automatic speaker verification (ASV). We present a comparison of three candidate fairness metrics and extend previous work performed for face recognition, by examining differential performance across a range of different ASV operating points. Results show that the Gini Aggregation Rate for Biometric Equitability (GARBE) is the only one which meets three functional fairness measure criteria. Furthermore, a comprehensive evaluation of the fairness and verification performance of five state-of-the-art ASV systems is also presented. Our findings reveal a nuanced trade-off between fairness and verification accuracy underscoring the complex interplay between system design, demographic inclusiveness, and verification reliability.
翻译:当自动化流程做出决策并处理个人数据时,人们期望获得公平性——即不同人口群体的成员能够受到公平对待。这一期望同样适用于自动说话人验证(ASV)等生物特征识别系统。我们比较了三种候选公平性指标,并通过检查ASV在不同工作点上的差分性能,扩展了先前针对人脸识别开展的研究。结果表明,生物特征公平性基尼聚合率(GARBE)是唯一满足三项功能性公平性度量标准的指标。此外,我们还对五种最先进的ASV系统的公平性与验证性能进行了全面评估。研究发现揭示了公平性与验证准确性之间微妙的权衡关系,凸显了系统设计、人口包容性与验证可靠性之间复杂的相互作用。