Face anti-spoofing (FAS) is an essential mechanism for safeguarding the integrity of automated face recognition systems. Despite substantial advancements, the generalization of existing approaches to real-world applications remains challenging. This limitation can be attributed to the scarcity and lack of diversity in publicly available FAS datasets, which often leads to overfitting during training or saturation during testing. In terms of quantity, the number of spoof subjects is a critical determinant. Most datasets comprise fewer than 2,000 subjects. With regard to diversity, the majority of datasets consist of spoof samples collected in controlled environments using repetitive, mechanical processes. This data collection methodology results in homogenized samples and a dearth of scenario diversity. To address these shortcomings, we introduce the Wild Face Anti-Spoofing (WFAS) dataset, a large-scale, diverse FAS dataset collected in unconstrained settings. Our dataset encompasses 853,729 images of 321,751 spoof subjects and 529,571 images of 148,169 live subjects, representing a substantial increase in quantity. Moreover, our dataset incorporates spoof data obtained from the internet, spanning a wide array of scenarios and various commercial sensors, including 17 presentation attacks (PAs) that encompass both 2D and 3D forms. This novel data collection strategy markedly enhances FAS data diversity. Leveraging the WFAS dataset and Protocol 1 (Known-Type), we host the Wild Face Anti-Spoofing Challenge at the CVPR2023 workshop. Additionally, we meticulously evaluate representative methods using Protocol 1 and Protocol 2 (Unknown-Type). Through an in-depth examination of the challenge outcomes and benchmark baselines, we provide insightful analyses and propose potential avenues for future research. The dataset is released under Insightface.
翻译:人脸防欺骗(FAS)是保障自动人脸识别系统安全性的关键机制。尽管已取得显著进展,现有方法在实际应用中的泛化能力仍面临挑战。这一局限性可归因于公开FAS数据集的稀缺性和多样性不足,常导致训练过拟合或测试饱和。就数量而言,欺骗主体的规模是关键决定因素——多数数据集包含的主体数少于2000个。就多样性而言,大部分数据集由在受控环境中通过重复性机械流程收集的欺骗样本构成。这种数据采集方法导致样本同质化严重,场景多样性匮乏。为弥补这些不足,我们提出了野外人脸防欺骗(WFAS)数据集——一个在非约束环境下采集的大规模多样化FAS数据集。该数据集包含853,729张来自321,751个欺骗主体的图像,以及529,571张来自148,169个活体主体的图像,实现了数据量级的显著提升。此外,数据集整合了来自互联网的欺骗数据,涵盖广泛场景及多种商业传感器,包含17种展示攻击(PAs),覆盖二维与三维形态。这种新型数据采集策略显著增强了FAS数据的多样性。基于WFAS数据集和协议1(已知类型),我们在CVPR2023研讨会上举办了野外人脸防欺骗挑战赛。同时,我们采用协议1和协议2(未知类型)对代表性方法进行了细致评估。通过对挑战结果和基准基线的深入分析,我们提供了富有洞见的研判,并提出了未来研究的潜在方向。该数据集已在Insightface框架下发布。