The existing facial datasets, while having plentiful images at near frontal views, lack images with extreme head poses, leading to the downgraded performance of deep learning models when dealing with profile or pitched faces. This work aims to address this gap by introducing a novel dataset named Extreme Pose Face High-Quality Dataset (EFHQ), which includes a maximum of 450k high-quality images of faces at extreme poses. To produce such a massive dataset, we utilize a novel and meticulous dataset processing pipeline to curate two publicly available datasets, VFHQ and CelebV-HQ, which contain many high-resolution face videos captured in various settings. Our dataset can complement existing datasets on various facial-related tasks, such as facial synthesis with 2D/3D-aware GAN, diffusion-based text-to-image face generation, and face reenactment. Specifically, training with EFHQ helps models generalize well across diverse poses, significantly improving performance in scenarios involving extreme views, confirmed by extensive experiments. Additionally, we utilize EFHQ to define a challenging cross-view face verification benchmark, in which the performance of SOTA face recognition models drops 5-37% compared to frontal-to-frontal scenarios, aiming to stimulate studies on face recognition under severe pose conditions in the wild.
翻译:现有面部数据集虽在前近视角下拥有丰富图像,但缺乏极端头部姿态的图像,导致深度学习模型在处理侧面或俯仰人脸时性能下降。本文旨在通过引入名为极端姿态人脸高质量数据集(EFHQ)的新数据集来填补这一空白,该数据集包含多达45万张极端姿态下的高质量人脸图像。为构建如此大规模的数据集,我们采用了一种新颖且精细的数据集处理流水线,对两个公开数据集VFHQ和CelebV-HQ进行筛选,这两个数据集包含大量在不同场景下采集的高分辨率人脸视频。我们的数据集可补充现有数据集在各类面部相关任务中的应用,例如基于2D/3D感知GAN的面部合成、基于扩散模型的文生脸图像生成以及面部重现。具体而言,使用EFHQ进行训练有助于模型在不同姿态下实现良好泛化,显著提升涉及极端视角场景的性能,这一点已通过大量实验得到证实。此外,我们利用EFHQ定义了一项具有挑战性的跨视角人脸验证基准,在该基准中,最先进的人脸识别模型性能相比正对正场景下降了5%-37%,旨在推动自然场景中严重姿态条件下人脸识别的研究。