AI based Face Recognition Systems (FRSs) are now widely distributed and deployed as MLaaS solutions all over the world, moreso since the COVID-19 pandemic for tasks ranging from validating individuals' faces while buying SIM cards to surveillance of citizens. Extensive biases have been reported against marginalized groups in these systems and have led to highly discriminatory outcomes. The post-pandemic world has normalized wearing face masks but FRSs have not kept up with the changing times. As a result, these systems are susceptible to mask based face occlusion. In this study, we audit four commercial and nine open-source FRSs for the task of face re-identification between different varieties of masked and unmasked images across five benchmark datasets (total 14,722 images). These simulate a realistic validation/surveillance task as deployed in all major countries around the world. Three of the commercial and five of the open-source FRSs are highly inaccurate; they further perpetuate biases against non-White individuals, with the lowest accuracy being 0%. A survey for the same task with 85 human participants also results in a low accuracy of 40%. Thus a human-in-the-loop moderation in the pipeline does not alleviate the concerns, as has been frequently hypothesized in literature. Our large-scale study shows that developers, lawmakers and users of such services need to rethink the design principles behind FRSs, especially for the task of face re-identification, taking cognizance of observed biases.
翻译:基于人工智能的人脸识别系统(FRSs)已作为机器学习即服务(MLaaS)解决方案在全球广泛部署,尤其在新冠疫情后,其应用涵盖从购买SIM卡时验证个人身份到公民监控等多种任务。已有大量研究表明,这些系统对边缘化群体存在显著偏见,并导致了高度歧视性的结果。后疫情时代,佩戴口罩已成为常态,但人脸识别系统未能与时俱进,因此易受口罩遮挡影响。本研究对四款商业系统及九款开源FRSs进行了审计,涉及五个基准数据集(总计14,722张图像)中不同类型口罩人脸与无口罩人脸之间的重识别任务。该任务模拟了全球主要国家实际部署的验证/监控场景。结果显示,其中三款商业系统与五款开源FRSs准确率极低,且进一步加剧了对非白人群体的偏见,最低准确率仅为0%。针对同一任务开展的85人参与调查同样呈现低准确率(40%)。这表明,文献中普遍假设的“人机协同”纠错机制并不能缓解上述问题。本大规模研究揭示,此类服务的开发者、立法者及用户需重新审视人脸识别系统的设计原则,尤其需重视人脸重识别任务中已观察到的偏见。