Over the past years, deep learning capabilities and the availability of large-scale training datasets advanced rapidly, leading to breakthroughs in face recognition accuracy. However, these technologies are foreseen to face a major challenge in the next years due to the legal and ethical concerns about using authentic biometric data in AI model training and evaluation along with increasingly utilizing data-hungry state-of-the-art deep learning models. With the recent advances in deep generative models and their success in generating realistic and high-resolution synthetic image data, privacy-friendly synthetic data has been recently proposed as an alternative to privacy-sensitive authentic data to overcome the challenges of using authentic data in face recognition development. This work aims at providing a clear and structured picture of the use-cases taxonomy of synthetic face data in face recognition along with the recent emerging advances of face recognition models developed on the bases of synthetic data. We also discuss the challenges facing the use of synthetic data in face recognition development and several future prospects of synthetic data in the domain of face recognition.
翻译:过去几年间,深度学习能力的提升与大规模训练数据集的广泛可用性,推动人脸识别精度取得了突破性进展。然而,由于在人工智能模型训练与评估中使用真实生物特征数据所引发的法律及伦理争议,加之对数据需求旺盛的先进深度学习模型日益普及,这些技术未来将面临重大挑战。随着深度生成模型的最新进展及其在生成高分辨率、逼真合成图像数据方面的成功,隐私友好的合成数据近期被提出作为隐私敏感型真实数据的替代方案,以应对真实数据在人脸识别开发中面临的诸多难题。本研究旨在系统梳理合成人脸数据在人脸识别应用场景中的分类体系,并总结基于合成数据构建的人脸识别模型近期涌现的突破性进展。同时,我们探讨了合成数据在人脸识别开发中面临的挑战,并展望了该领域合成数据的未来发展方向。