Fair biometric algorithms have similar verification performance across different demographic groups given a single decision threshold. Unfortunately, for state-of-the-art face recognition networks, score distributions differ between demographics. Contrary to work that tries to align those distributions by extra training or fine-tuning, we solely focus on score post-processing methods. As proved, well-known sample-centered score normalization techniques, Z-norm and T-norm, do not improve fairness for high-security operating points. Thus, we extend the standard Z/T-norm to integrate demographic information in normalization. Additionally, we investigate several possibilities to incorporate cohort similarities for both genuine and impostor pairs per demographic to improve fairness across different operating points. We run experiments on two datasets with different demographics (gender and ethnicity) and show that our techniques generally improve the overall fairness of five state-of-the-art pre-trained face recognition networks, without downgrading verification performance. We also indicate that an equal contribution of False Match Rate (FMR) and False Non-Match Rate (FNMR) in fairness evaluation is required for the highest gains. Code and protocols are available.
翻译:公平的生物特征识别算法在给定单一决策阈值时,应在不同人口统计群体间具有相似的验证性能。然而,当前最先进的人脸识别网络的分数分布在不同人口统计群体间存在差异。与通过额外训练或微调来对齐这些分布的研究不同,我们仅关注分数后处理方法。研究证明,广为人知的以样本为中心的分数归一化技术(Z-norm和T-norm)无法改善高安全级别操作点下的公平性。因此,我们扩展了标准的Z/T-norm方法,将人口统计信息整合到归一化过程中。此外,我们探索了多种可能性,通过结合每个群体内部真实对和冒名对的同类相似性,以改善不同操作点下的公平性。我们在两个包含不同人口统计属性(性别与种族)的数据集上进行了实验,结果表明我们的方法普遍提升了五种最先进的预训练人脸识别网络的整体公平性,且未降低验证性能。我们还指出,在公平性评估中,错误匹配率(FMR)与错误不匹配率(FNMR)的同等贡献是实现最高增益的必要条件。代码与实验协议已公开。