Recent advances in deep face recognition have spurred a growing demand for large, diverse, and manually annotated face datasets. Acquiring authentic, high-quality data for face recognition has proven to be a challenge, primarily due to privacy concerns. Large face datasets are primarily sourced from web-based images, lacking explicit user consent. In this paper, we examine whether and how synthetic face data can be used to train effective face recognition models with reduced reliance on authentic images, thereby mitigating data collection concerns. First, we explored the performance gap among recent state-of-the-art face recognition models, trained with synthetic data only and authentic (scarce) data only. Then, we deepened our analysis by training a state-of-the-art backbone with various combinations of synthetic and authentic data, gaining insights into optimizing the limited use of the latter for verification accuracy. Finally, we assessed the effectiveness of data augmentation approaches on synthetic and authentic data, with the same goal in mind. Our results highlighted the effectiveness of FR trained on combined datasets, particularly when combined with appropriate augmentation techniques.
翻译:近年来深度人脸识别的进展催生了对大规模、多样化且人工标注的人脸数据集的日益增长需求。获取真实、高质量的人脸识别数据已被证明是一项挑战,主要源于隐私问题。大规模人脸数据集主要来源于网络图像,缺乏明确的用户同意。本文探讨了合成人脸数据是否以及如何用于训练有效的人脸识别模型,从而减少对真实图像的依赖,进而缓解数据收集方面的顾虑。首先,我们探究了仅使用合成数据和仅使用(稀缺)真实数据训练的最新最优人脸识别模型之间的性能差距。随后,我们通过使用合成数据与真实数据的多种组合训练一个最优骨干网络来深化分析,旨在优化后者在验证准确性中的有限使用方式。最后,我们评估了数据增强方法对合成数据和真实数据的有效性,目标同样如此。我们的结果凸显了在组合数据集上训练的人脸识别模型的有效性,尤其是当与适当的增强技术结合使用时。