Biometric Verification (BV) systems often exhibit accuracy disparities across different demographic groups, leading to biases in BV applications. Assessing and quantifying these biases is essential for ensuring the fairness of BV systems. However, existing bias evaluation metrics in BV have limitations, such as focusing exclusively on match or non-match error rates, overlooking bias on demographic groups with performance levels falling between the best and worst performance levels, and neglecting the magnitude of the bias present. This paper presents an in-depth analysis of the limitations of current bias evaluation metrics in BV and, through experimental analysis, demonstrates their contextual suitability, merits, and limitations. Additionally, it introduces a novel general-purpose bias evaluation measure for BV, the ``Sum of Group Error Differences (SEDG)''. Our experimental results on controlled synthetic datasets demonstrate the effectiveness of demographic bias quantification when using existing metrics and our own proposed measure. We discuss the applicability of the bias evaluation metrics in a set of simulated demographic bias scenarios and provide scenario-based metric recommendations. Our code is publicly available under \url{https://github.com/alaaobeid/SEDG}.
翻译:生物特征验证(BV)系统在不同人口群体间常表现出准确性差异,导致BV应用中存在偏见。评估和量化这些偏见对于确保BV系统的公平性至关重要。然而,现有的BV偏见评估度量存在局限性,例如仅关注匹配或非匹配错误率、忽视性能介于最佳和最差水平之间的人口群体中的偏见,以及忽略偏见的程度。本文深入分析了当前BV偏见评估度量的局限性,并通过实验分析展示了它们的情境适用性、优点和不足。此外,本文提出了一种新的通用BV偏见评估度量——“群体错误差异之和(SEDG)”。我们在受控合成数据集上的实验结果表明,使用现有度量及我们提出的度量可有效量化人口偏见。我们讨论了在一组模拟人口偏见场景中偏见评估度量的适用性,并提出了基于场景的度量推荐。我们的代码公开于\url{https://github.com/alaaobeid/SEDG}。