Towards Fair Face Verification: An In-depth Analysis of Demographic Biases

Deep learning-based person identification and verification systems have remarkably improved in terms of accuracy in recent years; however, such systems, including widely popular cloud-based solutions, have been found to exhibit significant biases related to race, age, and gender, a problem that requires in-depth exploration and solutions. This paper presents an in-depth analysis, with a particular emphasis on the intersectionality of these demographic factors. Intersectional bias refers to the performance discrepancies w.r.t. the different combinations of race, age, and gender groups, an area relatively unexplored in current literature. Furthermore, the reliance of most state-of-the-art approaches on accuracy as the principal evaluation metric often masks significant demographic disparities in performance. To counter this crucial limitation, we incorporate five additional metrics in our quantitative analysis, including disparate impact and mistreatment metrics, which are typically ignored by the relevant fairness-aware approaches. Results on the Racial Faces in-the-Wild (RFW) benchmark indicate pervasive biases in face recognition systems, extending beyond race, with different demographic factors yielding significantly disparate outcomes. In particular, Africans demonstrate an 11.25% lower True Positive Rate (TPR) compared to Caucasians, while only a 3.51% accuracy drop is observed. Even more concerning, the intersections of multiple protected groups, such as African females over 60 years old, demonstrate a +39.89% disparate mistreatment rate compared to the highest Caucasians rate. By shedding light on these biases and their implications, this paper aims to stimulate further research towards developing fairer, more equitable face recognition and verification systems.

翻译：基于深度学习的人脸识别与验证系统近年来在准确性方面取得了显著提升；然而，包括广泛使用的云解决方案在内的此类系统，已被发现存在与种族、年龄和性别相关的显著偏差，这一问题亟需深入探究和解决方案。本文提出了一项深度分析，特别关注这些人口统计因素的交叉性。交叉偏差是指种族、年龄和性别群体不同组合之间的性能差异，这是当前文献中相对未被探索的领域。此外，大多数最先进方法依赖准确性作为主要评估指标，往往掩盖了显著的性能人口统计学差异。为应对这一关键局限，我们在定量分析中引入了五个额外指标，包括差异影响和差异对待指标，这些通常被相关公平感知方法所忽略。在野外种族人脸（RFW）基准上的结果表明，人脸识别系统中普遍存在偏差，远超种族范畴，不同人口统计因素导致显著不同的结果。特别是，非洲裔相比高加索裔的真阳性率（TPR）低11.25%，而准确性仅下降3.51%。更令人担忧的是，多个受保护群体的交叉，如60岁以上的非洲裔女性，与高加索裔最高比率相比，表现出+39.89%的差异对待率。通过揭示这些偏差及其影响，本文旨在推动进一步研究，以开发更公平、更公正的人脸识别与验证系统。